[SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy #13756

gatorsmile · 2016-06-18T05:20:18Z

What changes were proposed in this pull request?

Duplicate columns are not allowed in partitionBy, bucketBy, sortBy in DataFrameWriter. The duplicate columns could cause unpredictable results. For example, the resolution failure.

This PR is to detect the duplicates and issue exceptions with appropriate messages.

How was this patch tested?

Added test cases in DataFrameReaderWriterSuite

SparkQA · 2016-06-18T07:11:22Z

Test build #60764 has finished for PR 13756 at commit 83082ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-06-18T15:51:38Z

@cloud-fan @yhuai @liancheng @clockfly Could you please review this PR? Thanks!

liancheng · 2016-06-20T12:22:43Z

Do you mean bucketBy instead of blockBy in the PR title?

liancheng · 2016-06-20T12:33:35Z

I think it would be better to move these checks to the analyzer, so that the SQL equivalents of those structures (partitioning and bucketing) can also benefit from them.

gatorsmile · 2016-06-20T16:18:00Z

: ) Sharp eye! blockBy is the parameter name I used for another Project. Sorry for the wrong name. I did it more than once.

Let me try to find a common place in Analyzer for this. Thanks!

gatorsmile · 2016-06-22T03:35:27Z

It sounds like PreWriteCheck rule is a good home for adding this checking. Let me add this now

SparkQA · 2016-06-22T06:28:33Z

Test build #61005 has finished for PR 13756 at commit ae15ea9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-06-23T01:08:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

        // The relation in l is not an InsertableRelation.
        failAnalysis(s"$l does not allow insertion.")

+      case c: CreateTableUsing =>


how about CreateTableCommand and CreateHiveTableAsSelectLogicalPlan

: ) True. Let me add them now.

Only found one case: CREATE TABLE with PARTITION BY. Let me explain what I found.

First, CREATE TABLE command does not support bucketSpec. See code

Second, CREATE TABLE AS SELECT that can generate CreateHiveTableAsSelectLogicalPlan does not allow users to specify the schema, which includes partitionBy columns.

Let me know if anything is still missing. Thanks!

SparkQA · 2016-06-23T08:17:03Z

Test build #61099 has finished for PR 13756 at commit 24edb5f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-06-23T09:04:39Z

I'm thinking about if it's possible to concentrate error checking logics at one place for table creation. For example, we check duplicated table column names at parser for SQL statement(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L918), at command for DataFrameWriter(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L77).

And some checks are only valid for SQL statement, e.g. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L925

It will be good if we can abstract the common pattern and put all error checking logic together, and have a individual test suite to test it.

gatorsmile · 2016-06-23T16:58:29Z

Your suggestion is very good! Let me try it tonight. Thanks!

gatorsmile · 2016-06-24T06:13:16Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala


      case p @ CreateHiveTableAsSelectLogicalPlan(table, child, allowExisting) =>
+        // Ensuring whether no duplicate name is used in table definition
+        checkDuplicates(child.output.map(_.name), s"table definition of ${table.identifier}")


PreWriteCheck is executed after conversion from CreateHiveTableAsSelectLogicalPlan to execution.CreateHiveTableAsSelectCommand. PreWriteCheck is unable to access the Hive package execution.CreateHiveTableAsSelectCommand. Thus, I have no clue how to move this into PreWriteCheck. Introduce a new rule?

Actually, in Hive, there is a stage called Semantic Analysis, which is done before Analyzer but after Parser. That stage is for checking these semantic errors. Not sure whether we should add a similar concept into Spark SQL?

sounds like a good idea, do you have more information about the Semantic Analysis phase? What kind of checks can be done there?

Actually, my previous comment is not accurate. In Hive, they just split what our Analyzer does into two phases: Semantic Analyzer and Logical Plan Generator. In Semantic Analyzer also resolves the relations by Catalog. Please ignore what I said above. : )

To answer your last question, let me post the error messages generated by semantic analyzer. The range of error codes from 10000 to 19999 is used by semantic analyzer:

INVALID_TABLE(10001, "Table not found", "42S02"), INVALID_COLUMN(10002, "Invalid column reference"), INVALID_INDEX(10003, "Invalid index"), INVALID_TABLE_OR_COLUMN(10004, "Invalid table alias or column reference"), AMBIGUOUS_TABLE_OR_COLUMN(10005, "Ambiguous table alias or column reference"), INVALID_PARTITION(10006, "Partition not found"), AMBIGUOUS_COLUMN(10007, "Ambiguous column reference"), AMBIGUOUS_TABLE_ALIAS(10008, "Ambiguous table alias"), INVALID_TABLE_ALIAS(10009, "Invalid table alias"), NO_TABLE_ALIAS(10010, "No table alias"), INVALID_FUNCTION(10011, "Invalid function"), INVALID_FUNCTION_SIGNATURE(10012, "Function argument type mismatch"), INVALID_OPERATOR_SIGNATURE(10013, "Operator argument type mismatch"), INVALID_ARGUMENT(10014, "Wrong arguments"), INVALID_ARGUMENT_LENGTH(10015, "Arguments length mismatch", "21000"), INVALID_ARGUMENT_TYPE(10016, "Argument type mismatch"), INVALID_JOIN_CONDITION_1(10017, "Both left and right aliases encountered in JOIN"), INVALID_JOIN_CONDITION_2(10018, "Neither left nor right aliases encountered in JOIN"), INVALID_JOIN_CONDITION_3(10019, "OR not supported in JOIN currently"), INVALID_TRANSFORM(10020, "TRANSFORM with other SELECT columns not supported"), UNSUPPORTED_MULTIPLE_DISTINCTS(10022, "DISTINCT on different columns not supported" + " with skew in data"), NO_SUBQUERY_ALIAS(10023, "No alias for subquery"), NO_INSERT_INSUBQUERY(10024, "Cannot insert in a subquery. Inserting to table "), NON_KEY_EXPR_IN_GROUPBY(10025, "Expression not in GROUP BY key"), INVALID_XPATH(10026, "General . and [] operators are not supported"), INVALID_PATH(10027, "Invalid path"), ILLEGAL_PATH(10028, "Path is not legal"), INVALID_NUMERICAL_CONSTANT(10029, "Invalid numerical constant"), INVALID_ARRAYINDEX_TYPE(10030, "Not proper type for index of ARRAY. Currently, only integer type is supported"), INVALID_MAPINDEX_CONSTANT(10031, "Non-constant expression for map indexes not supported"), INVALID_MAPINDEX_TYPE(10032, "MAP key type does not match index expression type"), NON_COLLECTION_TYPE(10033, "[] not valid on non-collection types"), SELECT_DISTINCT_WITH_GROUPBY(10034, "SELECT DISTINCT and GROUP BY can not be in the same query"), COLUMN_REPEATED_IN_PARTITIONING_COLS(10035, "Column repeated in partitioning columns"), DUPLICATE_COLUMN_NAMES(10036, "Duplicate column name:"), INVALID_BUCKET_NUMBER(10037, "Bucket number should be bigger than zero"), COLUMN_REPEATED_IN_CLUSTER_SORT(10038, "Same column cannot appear in CLUSTER BY and SORT BY"), SAMPLE_RESTRICTION(10039, "Cannot SAMPLE on more than two columns"), SAMPLE_COLUMN_NOT_FOUND(10040, "SAMPLE column not found"), NO_PARTITION_PREDICATE(10041, "No partition predicate found"), INVALID_DOT(10042, ". Operator is only supported on struct or list of struct types"), INVALID_TBL_DDL_SERDE(10043, "Either list of columns or a custom serializer should be specified"), TARGET_TABLE_COLUMN_MISMATCH(10044, "Cannot insert into target table because column number/types are different"), TABLE_ALIAS_NOT_ALLOWED(10045, "Table alias not allowed in sampling clause"), CLUSTERBY_DISTRIBUTEBY_CONFLICT(10046, "Cannot have both CLUSTER BY and DISTRIBUTE BY clauses"), ORDERBY_DISTRIBUTEBY_CONFLICT(10047, "Cannot have both ORDER BY and DISTRIBUTE BY clauses"), CLUSTERBY_SORTBY_CONFLICT(10048, "Cannot have both CLUSTER BY and SORT BY clauses"), ORDERBY_SORTBY_CONFLICT(10049, "Cannot have both ORDER BY and SORT BY clauses"), CLUSTERBY_ORDERBY_CONFLICT(10050, "Cannot have both CLUSTER BY and ORDER BY clauses"), NO_LIMIT_WITH_ORDERBY(10051, "In strict mode, if ORDER BY is specified, " + "LIMIT must also be specified"), NO_CARTESIAN_PRODUCT(10052, "In strict mode, cartesian product is not allowed. " + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"), UNION_NOTIN_SUBQ(10053, "Top level UNION is not supported currently; " + "use a subquery for the UNION"), INVALID_INPUT_FORMAT_TYPE(10054, "Input format must implement InputFormat"), INVALID_OUTPUT_FORMAT_TYPE(10055, "Output Format must implement HiveOutputFormat, " + "otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat"), NO_VALID_PARTN(10056, "The query does not reference any valid partition. " + "To run this query, set hive.mapred.mode=nonstrict"), NO_OUTER_MAPJOIN(10057, "MAPJOIN cannot be performed with OUTER JOIN"), INVALID_MAPJOIN_HINT(10058, "All tables are specified as map-table for join"), INVALID_MAPJOIN_TABLE(10059, "Result of a union cannot be a map table"), NON_BUCKETED_TABLE(10060, "Sampling expression needed for non-bucketed table"), BUCKETED_NUMERATOR_BIGGER_DENOMINATOR(10061, "Numerator should not be bigger than " + "denominator in sample clause for table"), NEED_PARTITION_ERROR(10062, "Need to specify partition columns because the destination " + "table is partitioned"), CTAS_CTLT_COEXISTENCE(10063, "Create table command does not allow LIKE and AS-SELECT in " + "the same command"), LINES_TERMINATED_BY_NON_NEWLINE(10064, "LINES TERMINATED BY only supports " + "newline '\\n' right now"), CTAS_COLLST_COEXISTENCE(10065, "CREATE TABLE AS SELECT command cannot specify " + "the list of columns " + "for the target table"), CTLT_COLLST_COEXISTENCE(10066, "CREATE TABLE LIKE command cannot specify the list of columns for " + "the target table"), INVALID_SELECT_SCHEMA(10067, "Cannot derive schema from the select-clause"), CTAS_PARCOL_COEXISTENCE(10068, "CREATE-TABLE-AS-SELECT does not support " + "partitioning in the target table "), CTAS_MULTI_LOADFILE(10069, "CREATE-TABLE-AS-SELECT results in multiple file load"), CTAS_EXTTBL_COEXISTENCE(10070, "CREATE-TABLE-AS-SELECT cannot create external table"), INSERT_EXTERNAL_TABLE(10071, "Inserting into a external table is not allowed"), DATABASE_NOT_EXISTS(10072, "Database does not exist:"), TABLE_ALREADY_EXISTS(10073, "Table already exists:", "42S02"), COLUMN_ALIAS_ALREADY_EXISTS(10074, "Column alias already exists:", "42S02"), UDTF_MULTIPLE_EXPR(10075, "Only a single expression in the SELECT clause is " + "supported with UDTF's"), @Deprecated UDTF_REQUIRE_AS(10076, "UDTF's require an AS clause"), UDTF_NO_GROUP_BY(10077, "GROUP BY is not supported with a UDTF in the SELECT clause"), UDTF_NO_SORT_BY(10078, "SORT BY is not supported with a UDTF in the SELECT clause"), UDTF_NO_CLUSTER_BY(10079, "CLUSTER BY is not supported with a UDTF in the SELECT clause"), UDTF_NO_DISTRIBUTE_BY(10080, "DISTRUBTE BY is not supported with a UDTF in the SELECT clause"), UDTF_INVALID_LOCATION(10081, "UDTF's are not supported outside the SELECT clause, nor nested " + "in expressions"), UDTF_LATERAL_VIEW(10082, "UDTF's cannot be in a select expression when there is a lateral view"), UDTF_ALIAS_MISMATCH(10083, "The number of aliases supplied in the AS clause does not match the " + "number of columns output by the UDTF"), UDF_STATEFUL_INVALID_LOCATION(10084, "Stateful UDF's can only be invoked in the SELECT list"), LATERAL_VIEW_WITH_JOIN(10085, "JOIN with a LATERAL VIEW is not supported"), LATERAL_VIEW_INVALID_CHILD(10086, "LATERAL VIEW AST with invalid child"), OUTPUT_SPECIFIED_MULTIPLE_TIMES(10087, "The same output cannot be present multiple times: "), INVALID_AS(10088, "AS clause has an invalid number of aliases"), VIEW_COL_MISMATCH(10089, "The number of columns produced by the SELECT clause does not match the " + "number of column names specified by CREATE VIEW"), DML_AGAINST_VIEW(10090, "A view cannot be used as target table for LOAD or INSERT"), ANALYZE_VIEW(10091, "ANALYZE is not supported for views"), VIEW_PARTITION_TOTAL(10092, "At least one non-partitioning column must be present in view"), VIEW_PARTITION_MISMATCH(10093, "Rightmost columns in view output do not match " + "PARTITIONED ON clause"), PARTITION_DYN_STA_ORDER(10094, "Dynamic partition cannot be the parent of a static partition"), DYNAMIC_PARTITION_DISABLED(10095, "Dynamic partition is disabled. Either enable it by setting " + "hive.exec.dynamic.partition=true or specify partition column values"), DYNAMIC_PARTITION_STRICT_MODE(10096, "Dynamic partition strict mode requires at least one " + "static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict"), NONEXISTPARTCOL(10098, "Non-Partition column appears in the partition specification: "), UNSUPPORTED_TYPE(10099, "DATETIME type isn't supported yet. Please use " + "DATE or TIMESTAMP instead"), CREATE_NON_NATIVE_AS(10100, "CREATE TABLE AS SELECT cannot be used for a non-native table"), LOAD_INTO_NON_NATIVE(10101, "A non-native table cannot be used as target for LOAD"), LOCKMGR_NOT_SPECIFIED(10102, "Lock manager not specified correctly, set hive.lock.manager"), LOCKMGR_NOT_INITIALIZED(10103, "Lock manager could not be initialized, check hive.lock.manager "), LOCK_CANNOT_BE_ACQUIRED(10104, "Locks on the underlying objects cannot be acquired. " + "retry after some time"), ZOOKEEPER_CLIENT_COULD_NOT_BE_INITIALIZED(10105, "Check hive.zookeeper.quorum " + "and hive.zookeeper.client.port"), OVERWRITE_ARCHIVED_PART(10106, "Cannot overwrite an archived partition. " + "Unarchive before running this command"), ARCHIVE_METHODS_DISABLED(10107, "Archiving methods are currently disabled. " + "Please see the Hive wiki for more information about enabling archiving"), ARCHIVE_ON_MULI_PARTS(10108, "ARCHIVE can only be run on a single partition"), UNARCHIVE_ON_MULI_PARTS(10109, "ARCHIVE can only be run on a single partition"), ARCHIVE_ON_TABLE(10110, "ARCHIVE can only be run on partitions"), RESERVED_PART_VAL(10111, "Partition value contains a reserved substring"), OFFLINE_TABLE_OR_PARTITION(10113, "Query against an offline table or partition"), OUTERJOIN_USES_FILTERS(10114, "The query results could be wrong. " + "Turn on hive.outerjoin.supports.filters"), NEED_PARTITION_SPECIFICATION(10115, "Table is partitioned and partition specification is needed"), INVALID_METADATA(10116, "The metadata file could not be parsed "), NEED_TABLE_SPECIFICATION(10117, "Table name could be determined; It should be specified "), PARTITION_EXISTS(10118, "Partition already exists"), TABLE_DATA_EXISTS(10119, "Table exists and contains data files"), INCOMPATIBLE_SCHEMA(10120, "The existing table is not compatible with the import spec. "), EXIM_FOR_NON_NATIVE(10121, "Export/Import cannot be done for a non-native table. "), INSERT_INTO_BUCKETIZED_TABLE(10122, "Bucketized tables do not support INSERT INTO:"), NO_COMPARE_BIGINT_STRING(10123, "In strict mode, comparing bigints and strings is not allowed, " + "it may result in a loss of precision. " + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"), NO_COMPARE_BIGINT_DOUBLE(10124, "In strict mode, comparing bigints and doubles is not allowed, " + "it may result in a loss of precision. " + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"), PARTSPEC_DIFFER_FROM_SCHEMA(10125, "Partition columns in partition specification are " + "not the same as that defined in the table schema. " + "The names and orders have to be exactly the same."), PARTITION_COLUMN_NON_PRIMITIVE(10126, "Partition column must be of primitive type."), INSERT_INTO_DYNAMICPARTITION_IFNOTEXISTS(10127, "Dynamic partitions do not support IF NOT EXISTS. Specified partitions with value :"), UDAF_INVALID_LOCATION(10128, "Not yet supported place for UDAF"), DROP_PARTITION_NON_STRING_PARTCOLS_NONEQUALITY(10129, "Drop partitions for a non-string partition column is only allowed using equality"), ALTER_COMMAND_FOR_VIEWS(10131, "To alter a view you need to use the ALTER VIEW command."), ALTER_COMMAND_FOR_TABLES(10132, "To alter a base table you need to use the ALTER TABLE command."), ALTER_VIEW_DISALLOWED_OP(10133, "Cannot use this form of ALTER on a view"), ALTER_TABLE_NON_NATIVE(10134, "ALTER TABLE cannot be used for a non-native table"), SORTMERGE_MAPJOIN_FAILED(10135, "Sort merge bucketed join could not be performed. " + "If you really want to perform the operation, either set " + "hive.optimize.bucketmapjoin.sortedmerge=false, or set " + "hive.enforce.sortmergebucketmapjoin=false."), BUCKET_MAPJOIN_NOT_POSSIBLE(10136, "Bucketed mapjoin cannot be performed. " + "This can be due to multiple reasons: " + " . Join columns dont match bucketed columns. " + " . Number of buckets are not a multiple of each other. " + "If you really want to perform the operation, either remove the " + "mapjoin hint from your query or set hive.enforce.bucketmapjoin to false."), BUCKETED_TABLE_METADATA_INCORRECT(10141, "Bucketed table metadata is not correct. " + "Fix the metadata or don't use bucketed mapjoin, by setting " + "hive.enforce.bucketmapjoin to false."), JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer join(s) " + "cannot have more than 16 aliases"), INVALID_JDO_FILTER_EXPRESSION(10143, "Invalid expression for JDO filter"), SHOW_CREATETABLE_INDEX(10144, "SHOW CREATE TABLE does not support tables of type INDEX_TABLE."), ALTER_BUCKETNUM_NONBUCKETIZED_TBL(10145, "Table is not bucketized."), TRUNCATE_FOR_NON_MANAGED_TABLE(10146, "Cannot truncate non-managed table {0}.", true), TRUNCATE_FOR_NON_NATIVE_TABLE(10147, "Cannot truncate non-native table {0}.", true), PARTSPEC_FOR_NON_PARTITIONED_TABLE(10148, "Partition spec for non partitioned table {0}.", true), LOAD_INTO_STORED_AS_DIR(10195, "A stored-as-directories table cannot be used as target for LOAD"), ALTER_TBL_STOREDASDIR_NOT_SKEWED(10196, "This operation is only valid on skewed table."), ALTER_TBL_SKEWED_LOC_NO_LOC(10197, "Alter table skewed location doesn't have locations."), ALTER_TBL_SKEWED_LOC_NO_MAP(10198, "Alter table skewed location doesn't have location map."), SKEWED_TABLE_NO_COLUMN_NAME(10200, "No skewed column name."), SKEWED_TABLE_NO_COLUMN_VALUE(10201, "No skewed values."), SKEWED_TABLE_DUPLICATE_COLUMN_NAMES(10202, "Duplicate skewed column name:"), SKEWED_TABLE_INVALID_COLUMN(10203, "Invalid skewed column name:"), SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_1(10204, "Skewed column name is empty but skewed value is not."), SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_2(10205, "Skewed column value is empty but skewed name is not."), SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_3(10206, "The number of skewed column names and the number of " + "skewed column values are different: "), ALTER_TABLE_NOT_ALLOWED_RENAME_SKEWED_COLUMN(10207, " is a skewed column. It's not allowed to rename skewed column" + " or change skewed column type."), HIVE_GROUPING_SETS_AGGR_NOMAPAGGR(10209, "Grouping sets aggregations (with rollups or cubes) are not allowed if map-side " + " aggregation is turned off. Set hive.map.aggr=true if you want to use grouping sets"), HIVE_GROUPING_SETS_AGGR_EXPRESSION_INVALID(10210, "Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function " + "parameters overlap with the aggregation functions columns"), HIVE_GROUPING_SETS_AGGR_NOFUNC(10211, "Grouping sets aggregations are not allowed if no aggregation function is presented"), HIVE_UNION_REMOVE_OPTIMIZATION_NEEDS_SUBDIRECTORIES(10212, "In order to use hive.optimize.union.remove, the hadoop version that you are using " + "should support sub-directories for tables/partitions. If that is true, set " + "hive.hadoop.supports.subdirectories to true. Otherwise, set hive.optimize.union.remove " + "to false"), HIVE_GROUPING_SETS_EXPR_NOT_IN_GROUPBY(10213, "Grouping sets expression is not in GROUP BY key"), INVALID_PARTITION_SPEC(10214, "Invalid partition spec specified"), ALTER_TBL_UNSET_NON_EXIST_PROPERTY(10215, "Please use the following syntax if not sure " + "whether the property existed or not:\n" + "ALTER TABLE tableName UNSET TBLPROPERTIES IF EXISTS (key1, key2, ...)\n"), ALTER_VIEW_AS_SELECT_NOT_EXIST(10216, "Cannot ALTER VIEW AS SELECT if view currently does not exist\n"), REPLACE_VIEW_WITH_PARTITION(10217, "Cannot replace a view with CREATE VIEW or REPLACE VIEW or " + "ALTER VIEW AS SELECT if the view has partitions\n"), EXISTING_TABLE_IS_NOT_VIEW(10218, "Existing table is not a view\n"), NO_SUPPORTED_ORDERBY_ALLCOLREF_POS(10219, "Position in ORDER BY is not supported when using SELECT *"), INVALID_POSITION_ALIAS_IN_GROUPBY(10220, "Invalid position alias in Group By\n"), INVALID_POSITION_ALIAS_IN_ORDERBY(10221, "Invalid position alias in Order By\n"), HIVE_GROUPING_SETS_THRESHOLD_NOT_ALLOWED_WITH_SKEW(10225, "An additional MR job is introduced since the number of rows created per input row " + "due to grouping sets is more than hive.new.job.grouping.set.cardinality. There is no need " + "to handle skew separately. set hive.groupby.skewindata to false."), HIVE_GROUPING_SETS_THRESHOLD_NOT_ALLOWED_WITH_DISTINCTS(10226, "An additional MR job is introduced since the cardinality of grouping sets " + "is more than hive.new.job.grouping.set.cardinality. This functionality is not supported " + "with distincts. Either set hive.new.job.grouping.set.cardinality to a high number " + "(higher than the number of rows per input row due to grouping sets in the query), or " + "rewrite the query to not use distincts."), OPERATOR_NOT_ALLOWED_WITH_MAPJOIN(10227, "Not all clauses are supported with mapjoin hint. Please remove mapjoin hint."), ANALYZE_TABLE_NOSCAN_NON_NATIVE(10228, "ANALYZE TABLE NOSCAN cannot be used for " + "a non-native table"), ANALYZE_TABLE_PARTIALSCAN_NON_NATIVE(10229, "ANALYZE TABLE PARTIALSCAN cannot be used for " + "a non-native table"), ANALYZE_TABLE_PARTIALSCAN_NON_RCFILE(10230, "ANALYZE TABLE PARTIALSCAN doesn't " + "support non-RCfile. "), ANALYZE_TABLE_PARTIALSCAN_EXTERNAL_TABLE(10231, "ANALYZE TABLE PARTIALSCAN " + "doesn't support external table: "), ANALYZE_TABLE_PARTIALSCAN_AGGKEY(10232, "Analyze partialscan command " + "fails to construct aggregation for the partition "), ANALYZE_TABLE_PARTIALSCAN_AUTOGATHER(10233, "Analyze partialscan is not allowed " + "if hive.stats.autogather is set to false"), PARTITION_VALUE_NOT_CONTINUOUS(10234, "Parition values specifed are not continuous." + " A subpartition value is specified without specififying the parent partition's value"), TABLES_INCOMPATIBLE_SCHEMAS(10235, "Tables have incompatible schemas and their partitions " + " cannot be exchanged."), TRUNCATE_COLUMN_INDEXED_TABLE(10236, "Can not truncate columns from table with indexes"), TRUNCATE_COLUMN_NOT_RC(10237, "Only RCFileFormat supports column truncation."), TRUNCATE_COLUMN_ARCHIVED(10238, "Column truncation cannot be performed on archived partitions."), TRUNCATE_BUCKETED_COLUMN(10239, "A column on which a partition/table is bucketed cannot be truncated."), TRUNCATE_LIST_BUCKETED_COLUMN(10240, "A column on which a partition/table is list bucketed cannot be truncated."), TABLE_NOT_PARTITIONED(10241, "Table {0} is not a partitioned table", true), DATABSAE_ALREADY_EXISTS(10242, "Database {0} already exists", true), CANNOT_REPLACE_COLUMNS(10243, "Replace columns is not supported for table {0}. SerDe may be incompatible.", true), BAD_LOCATION_VALUE(10244, "{0} is not absolute. Please specify a complete absolute uri."), UNSUPPORTED_ALTER_TBL_OP(10245, "{0} alter table options is not supported"), INVALID_BIGTABLE_MAPJOIN(10246, "{0} table chosen for streaming is not valid", true), MISSING_OVER_CLAUSE(10247, "Missing over clause for function : "), PARTITION_SPEC_TYPE_MISMATCH(10248, "Cannot add partition column {0} of type {1} as it cannot be converted to type {2}", true), UNSUPPORTED_SUBQUERY_EXPRESSION(10249, "Unsupported SubQuery Expression"), INVALID_SUBQUERY_EXPRESSION(10250, "Invalid SubQuery expression"), INVALID_HDFS_URI(10251, "{0} is not a hdfs uri", true), INVALID_DIR(10252, "{0} is not a directory", true), NO_VALID_LOCATIONS(10253, "Could not find any valid location to place the jars. " + "Please update hive.jar.directory or hive.user.install.directory with a valid location", false), UNSUPPORTED_AUTHORIZATION_PRINCIPAL_TYPE_GROUP(10254, "Principal type GROUP is not supported in this authorization setting", "28000"), INVALID_TABLE_NAME(10255, "Invalid table name {0}", true), INSERT_INTO_IMMUTABLE_TABLE(10256, "Inserting into a non-empty immutable table is not allowed"), UNSUPPORTED_AUTHORIZATION_RESOURCE_TYPE_GLOBAL(10257, "Resource type GLOBAL is not supported in this authorization setting", "28000"), UNSUPPORTED_AUTHORIZATION_RESOURCE_TYPE_COLUMN(10258, "Resource type COLUMN is not supported in this authorization setting", "28000"), TXNMGR_NOT_SPECIFIED(10260, "Transaction manager not specified correctly, " + "set hive.txn.manager"), TXNMGR_NOT_INSTANTIATED(10261, "Transaction manager could not be " + "instantiated, check hive.txn.manager"), TXN_NO_SUCH_TRANSACTION(10262, "No record of transaction {0} could be found, " + "may have timed out", true), TXN_ABORTED(10263, "Transaction manager has aborted the transaction {0}.", true), DBTXNMGR_REQUIRES_CONCURRENCY(10264, "To use DbTxnManager you must set hive.support.concurrency=true"), TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true), LOCK_NO_SUCH_LOCK(10270, "No record of lock {0} could be found, " + "may have timed out", true), LOCK_REQUEST_UNSUPPORTED(10271, "Current transaction manager does not " + "support explicit lock requests. Transaction manager: "), METASTORE_COMMUNICATION_FAILED(10280, "Error communicating with the " + "metastore"), METASTORE_COULD_NOT_INITIATE(10281, "Unable to initiate connection to the " + "metastore."), INVALID_COMPACTION_TYPE(10282, "Invalid compaction type, supported values are 'major' and " + "'minor'"), NO_COMPACTION_PARTITION(10283, "You must specify a partition to compact for partitioned tables"), TOO_MANY_COMPACTION_PARTITIONS(10284, "Compaction can only be requested on one partition at a " + "time."), DISTINCT_NOT_SUPPORTED(10285, "Distinct keyword is not support in current context"), UPDATEDELETE_PARSE_ERROR(10290, "Encountered parse error while parsing rewritten update or " + "delete query"), UPDATEDELETE_IO_ERROR(10291, "Encountered I/O error while parsing rewritten update or " + "delete query"), UPDATE_CANNOT_UPDATE_PART_VALUE(10292, "Updating values of partition columns is not supported"), INSERT_CANNOT_CREATE_TEMP_FILE(10293, "Unable to create temp file for insert values "), ACID_OP_ON_NONACID_TXNMGR(10294, "Attempt to do update or delete using transaction manager that" + " does not support these operations."), NO_INSERT_OVERWRITE_WITH_ACID(10295, "INSERT OVERWRITE not allowed on table with OutputFormat " + "that implements AcidOutputFormat while transaction manager that supports ACID is in use"), VALUES_TABLE_CONSTRUCTOR_NOT_SUPPORTED(10296, "Values clause with table constructor not yet supported"), ACID_OP_ON_NONACID_TABLE(10297, "Attempt to do update or delete on table {0} that does not use " + "an AcidOutputFormat or is not bucketed", true), ACID_NO_SORTED_BUCKETS(10298, "ACID insert, update, delete not supported on tables that are " + "sorted, table {0}", true), ALTER_TABLE_TYPE_PARTIAL_PARTITION_SPEC_NO_SUPPORTED(10299, "Alter table partition type {0} does not allow partial partition spec", true), ALTER_TABLE_PARTITION_CASCADE_NOT_SUPPORTED(10300, "Alter table partition type {0} does not support cascade", true), DROP_NATIVE_FUNCTION(10301, "Cannot drop native function"), UPDATE_CANNOT_UPDATE_BUCKET_VALUE(10302, "Updating values of bucketing columns is not supported. Column {0}.", true), IMPORT_INTO_STRICT_REPL_TABLE(10303,"Non-repl import disallowed against table that is a destination of replication."), CTAS_LOCATION_NONEMPTY(10304, "CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory."), CTAS_CREATES_VOID_TYPE(10305, "CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: "), TBL_SORTED_NOT_BUCKETED(10306, "Destination table {0} found to be sorted but not bucketed.", true), //{2} should be lockid LOCK_ACQUIRE_TIMEDOUT(10307, "Lock acquisition for {0} timed out after {1}ms. {2}", true), COMPILE_LOCK_TIMED_OUT(10308, "Attempt to acquire compile lock timed out.", true), CANNOT_CHANGE_SERDE(10309, "Changing SerDe (from {0}) is not supported for table {1}. File format may be incompatible", true), CANNOT_CHANGE_FILEFORMAT(10310, "Changing file format (from {0}) is not supported for table {1}", true), CANNOT_REORDER_COLUMNS(10311, "Reordering columns is not supported for table {0}. SerDe may be incompatible", true), CANNOT_CHANGE_COLUMN_TYPE(10312, "Changing from type {0} to {1} is not supported for column {2}. SerDe may be incompatible", true), REPLACE_CANNOT_DROP_COLUMNS(10313, "Replacing columns cannot drop columns for table {0}. SerDe may be incompatible", true), REPLACE_UNSUPPORTED_TYPE_CONVERSION(10314, "Replacing columns with unsupported type conversion (from {0} to {1}) for column {2}. SerDe may be incompatible", true), HIVE_GROUPING_SETS_AGGR_NOMAPAGGR_MULTIGBY(10315, "Grouping sets aggregations (with rollups or cubes) are not allowed when " + "HIVEMULTIGROUPBYSINGLEREDUCER is turned on. Set hive.multigroupby.singlereducer=false if you want to use grouping sets"), CANNOT_RETRIEVE_TABLE_METADATA(10316, "Error while retrieving table metadata"), CANNOT_DROP_INDEX(10317, "Error while dropping index"), INVALID_AST_TREE(10318, "Internal error : Invalid AST"), ERROR_SERIALIZE_METASTORE(10319, "Error while serializing the metastore objects"), IO_ERROR(10320, "Error while peforming IO operation "), ERROR_SERIALIZE_METADATA(10321, "Error while serializing the metadata"), INVALID_LOAD_TABLE_FILE_WORK(10322, "Invalid Load Table Work or Load File Work"), CLASSPATH_ERROR(10323, "Classpath error"), IMPORT_SEMANTIC_ERROR(10324, "Import Semantic Analyzer Error"),

@rxin @cloud-fan @liancheng @yhuai Do you think we can open an umbrallel JIRA for the whole community to track whether the same/similar error messages should be issued by Spark SQL? That could help us find all the potential holes and improve the code quality?

SparkQA · 2016-06-24T07:42:33Z

Test build #61154 has finished for PR 13756 at commit 53417f1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- public class JavaPackage
- case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

SparkQA · 2016-06-24T08:13:54Z

Test build #61153 has finished for PR 13756 at commit c0e7e0c.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

gatorsmile · 2016-06-24T15:56:58Z

retest this please

SparkQA · 2016-06-24T17:36:38Z

Test build #61179 has finished for PR 13756 at commit 53417f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- public class JavaPackage
- case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

gatorsmile · 2016-07-01T19:34:36Z

retest this please

SparkQA · 2016-07-01T21:20:47Z

Test build #61634 has finished for PR 13756 at commit 53417f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- public class JavaPackage
- case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

gatorsmile · 2016-07-01T22:09:02Z

ping @liancheng @cloud-fan

liancheng · 2016-07-05T09:04:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

  /**
-   * Create a table, returning either a [[CreateTableCommand]] or a
-   * [[CreateHiveTableAsSelectLogicalPlan]].
+   * Create a table, returning either a [[CreateTableCommand]], a


Nit: Remove "either".

liancheng · 2016-07-05T10:31:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala


+      case c: CreateTableCommand =>
+        val allColNamesInSchema = c.table.schema.map(_.name)
+        val colNames = allColNamesInSchema.diff(c.table.partitionColumnNames)


Is it safe to do case sensitive comparison here?

SparkQA · 2016-07-05T19:39:45Z

Test build #61770 has finished for PR 13756 at commit 08b5374.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-07-05T20:07:47Z

retest this please

SparkQA · 2016-07-05T21:59:08Z

Test build #61785 has finished for PR 13756 at commit 08b5374.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-07-12T17:23:51Z

retest this please

SparkQA · 2016-07-12T19:22:54Z

Test build #62182 has finished for PR 13756 at commit 08b5374.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-07-12T20:58:34Z

cc @liancheng @cloud-fan

gatorsmile · 2016-07-23T05:41:06Z

retest this please

SparkQA · 2016-07-23T07:22:36Z

Test build #62746 has finished for PR 13756 at commit 08b5374.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-07-28T20:28:01Z

retest this please

SparkQA · 2016-07-28T22:27:26Z

Test build #62985 has finished for PR 13756 at commit 08b5374.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-08-04T19:31:39Z

This is part of #14482. Close it now

fix

83082ff

gatorsmile changed the title ~~[SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, blockBy and sortBy in DataFrameWriter~~ [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy in DataFrameWriter Jun 20, 2016

Merge remote-tracking branch 'upstream/master' into dedup

83828fe

gatorsmile added 2 commits June 21, 2016 21:31

address comments

785d625

Merge remote-tracking branch 'upstream/master' into dedup

ae15ea9

gatorsmile changed the title ~~[SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy in DataFrameWriter~~ [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy Jun 22, 2016

cloud-fan reviewed Jun 23, 2016
View reviewed changes

address comments

24edb5f

gatorsmile added 3 commits June 23, 2016 20:00

revert

69d7de6

address comments

c0e7e0c

Merge remote-tracking branch 'upstream/master' into dedup

53417f1

gatorsmile reviewed Jun 24, 2016
View reviewed changes

liancheng reviewed Jul 5, 2016
View reviewed changes

gatorsmile added 2 commits July 5, 2016 10:01

Merge remote-tracking branch 'upstream/master' into dedup

6bd359c

address comments.

08b5374

gatorsmile mentioned this pull request Aug 4, 2016

[SPARK-16879][SQL] unify logical plans for CREATE TABLE and CTAS #14482

Closed

gatorsmile closed this Aug 4, 2016

[SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy #13756

[SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy #13756

Uh oh!

Conversation

gatorsmile commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

gatorsmile commented Jun 18, 2016

Uh oh!

liancheng commented Jun 20, 2016

Uh oh!

liancheng commented Jun 20, 2016

Uh oh!

gatorsmile commented Jun 20, 2016

Uh oh!

gatorsmile commented Jun 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jun 22, 2016

Uh oh!

cloud-fan Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 23, 2016

Uh oh!

cloud-fan commented Jun 23, 2016

Uh oh!

gatorsmile commented Jun 23, 2016

Uh oh!

gatorsmile Jun 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jun 24, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 24, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 24, 2016

Uh oh!

SparkQA commented Jun 24, 2016

Uh oh!

gatorsmile commented Jun 24, 2016

Uh oh!

SparkQA commented Jun 24, 2016

Uh oh!

gatorsmile commented Jul 1, 2016

Uh oh!

SparkQA commented Jul 1, 2016

Uh oh!

gatorsmile commented Jul 1, 2016

Uh oh!

liancheng Jul 5, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng Jul 5, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 5, 2016

Uh oh!

gatorsmile commented Jul 5, 2016

Uh oh!

SparkQA commented Jul 5, 2016

Uh oh!

gatorsmile commented Jul 12, 2016

gatorsmile commented Jun 18, 2016 •

edited

Loading

gatorsmile commented Jun 22, 2016 •

edited

Loading

gatorsmile Jun 24, 2016 •

edited

Loading

gatorsmile Jun 24, 2016 •

edited

Loading