Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,7 @@ identifierComment
;

relationPrimary
: tableIdentifier sample? (AS? strictIdentifier)? #tableName
: tableIdentifier sample? tableAlias #tableName
| '(' queryNoWith ')' sample? (AS? strictIdentifier) #aliasedQuery
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
| inlineTable #inlineTableDefault2
Expand Down Expand Up @@ -711,7 +711,7 @@ nonReserved
| ADD
| OVER | PARTITION | RANGE | ROWS | PRECEDING | FOLLOWING | CURRENT | ROW | LAST | FIRST | AFTER
| MAP | ARRAY | STRUCT
| LATERAL | WINDOW | REDUCE | TRANSFORM | USING | SERDE | SERDEPROPERTIES | RECORDREADER
Copy link
Member

@gatorsmile gatorsmile May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this USING entry caused a failure in PlanParserSuite;
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L336

- joins *** FAILED ***
  == FAIL: Plans do not match ===
   'Project [*]                               'Project [*]
  !+- 'Join Inner                             +- 'Join UsingJoin(Inner,List(a, b))
      :- 'UnresolvedRelation `t`                 :- 'UnresolvedRelation `t`
  !   +- 'SubqueryAlias using                    +- 'UnresolvedRelation `u`
  !      +- 'UnresolvedRelation `u`, [a, b] (PlanTest.scala:97)

Copy link
Member

@gatorsmile gatorsmile May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, based on the link http://developer.mimer.se/validator/sql-reserved-words.tml, USING is a reserved word in the ANSI SQL standard since SQL-92.

Second, since 1.2, Hive introduces a flag hive.support.sql11.reserved.keywords for backward compatbility, which defaults to true.

Added In: Hive 1.2.0 with HIVE-6617: https://issues.apache.org/jira/browse/HIVE-6617
Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords.

In 2.2, Hive removes this flag and does not allow users to change it to false. That means, users are unable to use these reserved words as identifiers anymore, unless using them as quoted identifiers. See: https://issues.apache.org/jira/browse/HIVE-14872

Thus, I think it is safe to remove USING from the non-reserved words.

cc @hvanhovell @cloud-fan

Copy link
Member Author

@maropu maropu May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how did we decide these non-reserved words for Spark? It seems a lot of non-reserved words (e.g., CUBE and GROUPING) in Spark are the reserved ones in the ANSI standard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added as much to the non-reserved keyword list as possible (without creating ambiguities). The reason for this is that many datasources (for instance twitter4j) unfortunately use reserved keywords for column names, and working with these was quite cumbersome. I took the pragmatic approach.

If we want to change this, then we need to do the same Hive did and create a config flag. We remove them for Spark 3.0...

| LATERAL | WINDOW | REDUCE | TRANSFORM | SERDE | SERDEPROPERTIES | RECORDREADER
| DELIMITED | FIELDS | TERMINATED | COLLECTION | ITEMS | KEYS | ESCAPED | LINES | SEPARATED
| EXTENDED | REFRESH | CLEAR | CACHE | UNCACHE | LAZY | GLOBAL | TEMPORARY | OPTIONS
| GROUPING | CUBE | ROLLUP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,25 @@ class Analyzer(
def resolveRelation(plan: LogicalPlan): LogicalPlan = plan match {
case u: UnresolvedRelation if !isRunningDirectlyOnFiles(u.tableIdentifier) =>
val defaultDatabase = AnalysisContext.get.defaultDatabase
val relation = lookupTableFromCatalog(u, defaultDatabase)
val foundRelation = lookupTableFromCatalog(u, defaultDatabase)

// Add `Project` to rename output column names if a query has alias names:
// e.g., SELECT col1, col2 FROM testData AS t(col1, col2)
val relation = if (u.outputColumnNames.nonEmpty) {
val outputAttrs = foundRelation.output
// Checks if the number of the aliases equals to the number of columns in the table.
if (u.outputColumnNames.size != outputAttrs.size) {
u.failAnalysis(s"Number of column aliases does not match number of columns. " +
s"Table name: ${u.tableName}; number of column aliases: " +
s"${u.outputColumnNames.size}; number of columns: ${outputAttrs.size}.")
}
val aliases = outputAttrs.zip(u.outputColumnNames).map {
case (attr, name) => Alias(attr, name)()
}
Project(aliases, foundRelation)
} else {
foundRelation
}
resolveRelation(relation)
// The view's child should be a logical plan parsed from the `desc.viewText`, the variable
// `viewText` should be defined, or else we throw an error on the generation of the View
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,9 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
val outputAttrs = resolvedFunc.output
// Checks if the number of the aliases is equal to expected one
if (u.outputNames.size != outputAttrs.size) {
u.failAnalysis(s"expected ${outputAttrs.size} columns but " +
s"found ${u.outputNames.size} columns")
u.failAnalysis(s"Number of given aliases does not match number of output columns. " +
s"Function name: ${u.functionName}; number of aliases: " +
s"${u.outputNames.size}; number of output columns: ${outputAttrs.size}.")
}
Copy link
Member Author

@maropu maropu May 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not related to this pr though, I modified along with this fix: https://github.com/apache/spark/pull/18079/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R604

val aliases = outputAttrs.zip(u.outputNames).map {
case (attr, name) => Alias(attr, name)()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,21 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: Str

/**
* Holds the name of a relation that has yet to be looked up in a catalog.
* We could add alias names for columns in a relation:
* {{{
* // Assign alias names
* SELECT col1, col2 FROM testData AS t(col1, col2);
* }}}
*
* @param tableIdentifier table name
* @param outputColumnNames alias names of columns. If these names given, an analyzer adds
* [[Project]] to rename the columns.
*/
case class UnresolvedRelation(tableIdentifier: TableIdentifier) extends LeafNode {
case class UnresolvedRelation(
tableIdentifier: TableIdentifier,
outputColumnNames: Seq[String] = Seq.empty)
extends LeafNode {

/** Returns a `.` separated name for this relation. */
def tableName: String = tableIdentifier.unquotedString

Expand Down Expand Up @@ -71,6 +84,11 @@ case class UnresolvedInlineTable(
* // Assign alias names
* select t.a from range(10) t(a);
* }}}
*
* @param functionName name of this table-value function
* @param functionArgs list of function arguments
* @param outputNames alias names of function output columns. If these names given, an analyzer
* adds [[Project]] to rename the output columns.
*/
case class UnresolvedTableValuedFunction(
functionName: String,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -676,12 +676,16 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
* Create an aliased table reference. This is typically used in FROM clauses.
*/
override def visitTableName(ctx: TableNameContext): LogicalPlan = withOrigin(ctx) {
val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier))

val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match {
case Some(strictIdentifier) =>
SubqueryAlias(strictIdentifier, table)
case _ => table
val tableId = visitTableIdentifier(ctx.tableIdentifier)
val table = if (ctx.tableAlias.identifierList != null) {
UnresolvedRelation(tableId, visitIdentifierList(ctx.tableAlias.identifierList))
} else {
UnresolvedRelation(tableId)
}
val tableWithAlias = if (ctx.tableAlias.strictIdentifier != null) {
SubqueryAlias(ctx.tableAlias.strictIdentifier.getText, table)
} else {
table
}
tableWithAlias.optionalMap(ctx.sample)(withSample)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,23 @@ class AnalysisSuite extends AnalysisTest with ShouldMatchers {
assertAnalysisSuccess(rangeWithAliases(2 :: 6 :: 2 :: Nil, "c" :: Nil))
assertAnalysisError(
rangeWithAliases(3 :: Nil, "a" :: "b" :: Nil),
Seq("expected 1 columns but found 2 columns"))
Seq("Number of given aliases does not match number of output columns. "
+ "Function name: range; number of aliases: 2; number of output columns: 1."))
}

test("SPARK-20841 Support table column aliases in FROM clause") {
def tableColumnsWithAliases(outputNames: Seq[String]): LogicalPlan = {
SubqueryAlias("t", UnresolvedRelation(TableIdentifier("TaBlE3"), outputNames))
.select(star())
}
assertAnalysisSuccess(tableColumnsWithAliases("col1" :: "col2" :: "col3" :: "col4" :: Nil))
assertAnalysisError(
tableColumnsWithAliases("col1" :: Nil),
Seq("Number of column aliases does not match number of columns. Table name: TaBlE3; " +
"number of column aliases: 1; number of columns: 4."))
assertAnalysisError(
tableColumnsWithAliases("col1" :: "col2" :: "col3" :: "col4" :: "col5" :: Nil),
Seq("Number of column aliases does not match number of columns. Table name: TaBlE3; " +
"number of column aliases: 5; number of columns: 4."))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ trait AnalysisTest extends PlanTest {
val catalog = new SessionCatalog(new InMemoryCatalog, EmptyFunctionRegistry, conf)
catalog.createTempView("TaBlE", TestRelations.testRelation, overrideIfExists = true)
catalog.createTempView("TaBlE2", TestRelations.testRelation2, overrideIfExists = true)
catalog.createTempView("TaBlE3", TestRelations.testRelation3, overrideIfExists = true)
new Analyzer(catalog, conf) {
override val extendedResolutionRules = EliminateSubqueryAliases :: Nil
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

package org.apache.spark.sql.catalyst.parser

import org.apache.spark.sql.catalyst.FunctionIdentifier
import org.apache.spark.sql.catalyst.analysis.{UnresolvedGenerator, UnresolvedInlineTable, UnresolvedTableValuedFunction}
import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
import org.apache.spark.sql.catalyst.analysis.{UnresolvedGenerator, UnresolvedInlineTable, UnresolvedRelation, UnresolvedTableValuedFunction}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.logical._
Expand Down Expand Up @@ -493,6 +493,13 @@ class PlanParserSuite extends PlanTest {
.select(star()))
}

test("SPARK-20841 Support table column aliases in FROM clause") {
assertEqual(
"SELECT * FROM testData AS t(col1, col2)",
SubqueryAlias("t", UnresolvedRelation(TableIdentifier("testData"), Seq("col1", "col2")))
.select(star()))
}

test("inline table") {
assertEqual("values 1, 2, 3, 4",
UnresolvedInlineTable(Seq("col1"), Seq(1, 2, 3, 4).map(x => Seq(Literal(x)))))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class TableIdentifierParserSuite extends SparkFunSuite {
"insert", "int", "into", "is", "lateral", "like", "local", "none", "null",
"of", "order", "out", "outer", "partition", "percent", "procedure", "range", "reads", "revoke",
"rollup", "row", "rows", "set", "smallint", "table", "timestamp", "to", "trigger",
"true", "truncate", "update", "user", "using", "values", "with", "regexp", "rlike",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

"true", "truncate", "update", "user", "values", "with", "regexp", "rlike",
"bigint", "binary", "boolean", "current_date", "current_timestamp", "date", "double", "float",
"int", "smallint", "timestamp", "at")

Expand Down
17 changes: 17 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/table-aliases.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
-- Test data.
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES (1, 1), (1, 2), (2, 1) AS testData(a, b);

-- Table column aliases in FROM clause
SELECT * FROM testData AS t(col1, col2) WHERE col1 = 1;

SELECT * FROM testData AS t(col1, col2) WHERE col1 = 2;

SELECT col1 AS k, SUM(col2) FROM testData AS t(col1, col2) GROUP BY k;

-- Aliasing the wrong number of columns in the FROM clause
SELECT * FROM testData AS t(col1, col2, col3);

SELECT * FROM testData AS t(col1);

-- Check alias duplication
SELECT a AS col1, b AS col2 FROM testData AS t(c, d);
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 7


-- !query 0
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES (1, 1), (1, 2), (2, 1) AS testData(a, b)
-- !query 0 schema
struct<>
-- !query 0 output



-- !query 1
SELECT * FROM testData AS t(col1, col2) WHERE col1 = 1
-- !query 1 schema
struct<col1:int,col2:int>
-- !query 1 output
1 1
1 2


-- !query 2
SELECT * FROM testData AS t(col1, col2) WHERE col1 = 2
-- !query 2 schema
struct<col1:int,col2:int>
-- !query 2 output
2 1


-- !query 3
SELECT col1 AS k, SUM(col2) FROM testData AS t(col1, col2) GROUP BY k
-- !query 3 schema
struct<k:int,sum(col2):bigint>
-- !query 3 output
1 3
2 1


-- !query 4
SELECT * FROM testData AS t(col1, col2, col3)
-- !query 4 schema
struct<>
-- !query 4 output
org.apache.spark.sql.AnalysisException
Number of column aliases does not match number of columns. Table name: testData; number of column aliases: 3; number of columns: 2.; line 1 pos 14


-- !query 5
SELECT * FROM testData AS t(col1)
-- !query 5 schema
struct<>
-- !query 5 output
org.apache.spark.sql.AnalysisException
Number of column aliases does not match number of columns. Table name: testData; number of column aliases: 1; number of columns: 2.; line 1 pos 14


-- !query 6
SELECT a AS col1, b AS col2 FROM testData AS t(c, d)
-- !query 6 schema
struct<>
-- !query 6 output
org.apache.spark.sql.AnalysisException
cannot resolve '`a`' given input columns: [c, d]; line 1 pos 7
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ object TPCDSQueryBenchmark {
// per-row processing time for those cases.
val queryRelations = scala.collection.mutable.HashSet[String]()
spark.sql(queryString).queryExecution.logical.map {
case ur @ UnresolvedRelation(t: TableIdentifier) =>
case UnresolvedRelation(t: TableIdentifier, _) =>
queryRelations.add(t.table)
case lp: LogicalPlan =>
lp.expressions.foreach { _ foreach {
case subquery: SubqueryExpression =>
subquery.plan.foreach {
case ur @ UnresolvedRelation(t: TableIdentifier) =>
case UnresolvedRelation(t: TableIdentifier, _) =>
queryRelations.add(t.table)
case _ =>
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,7 @@ private[hive] class TestHiveQueryExecution(
// Make sure any test tables referenced are loaded.
val referencedTables =
describedTables ++
logical.collect { case UnresolvedRelation(tableIdent) => tableIdent.table }
logical.collect { case UnresolvedRelation(tableIdent, _) => tableIdent.table }
val referencedTestTables = referencedTables.filter(sparkSession.testTables.contains)
logDebug(s"Query references test tables: ${referencedTestTables.mkString(", ")}")
referencedTestTables.foreach(sparkSession.loadTestTable)
Expand Down