Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jun 18, 2016

What changes were proposed in this pull request?

Duplicate columns are not allowed in partitionBy, bucketBy, sortBy in DataFrameWriter. The duplicate columns could cause unpredictable results. For example, the resolution failure.

This PR is to detect the duplicates and issue exceptions with appropriate messages.

How was this patch tested?

Added test cases in DataFrameReaderWriterSuite

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60764 has finished for PR 13756 at commit 83082ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@cloud-fan @yhuai @liancheng @clockfly Could you please review this PR? Thanks!

@liancheng
Copy link
Contributor

Do you mean bucketBy instead of blockBy in the PR title?

@liancheng
Copy link
Contributor

I think it would be better to move these checks to the analyzer, so that the SQL equivalents of those structures (partitioning and bucketing) can also benefit from them.

@gatorsmile
Copy link
Member Author

: ) Sharp eye! blockBy is the parameter name I used for another Project. Sorry for the wrong name. I did it more than once.

Let me try to find a common place in Analyzer for this. Thanks!

@gatorsmile gatorsmile changed the title [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, blockBy and sortBy in DataFrameWriter [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy in DataFrameWriter Jun 20, 2016
@gatorsmile
Copy link
Member Author

gatorsmile commented Jun 22, 2016

It sounds like PreWriteCheck rule is a good home for adding this checking. Let me add this now

@gatorsmile gatorsmile changed the title [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy in DataFrameWriter [SPARK-16041][SQL] Disallow Duplicate Columns in partitionBy, bucketBy and sortBy Jun 22, 2016
@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61005 has finished for PR 13756 at commit ae15ea9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// The relation in l is not an InsertableRelation.
failAnalysis(s"$l does not allow insertion.")

case c: CreateTableUsing =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about CreateTableCommand and CreateHiveTableAsSelectLogicalPlan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) True. Let me add them now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only found one case: CREATE TABLE with PARTITION BY. Let me explain what I found.

First, CREATE TABLE command does not support bucketSpec. See code

Second, CREATE TABLE AS SELECT that can generate CreateHiveTableAsSelectLogicalPlan does not allow users to specify the schema, which includes partitionBy columns.

Let me know if anything is still missing. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61099 has finished for PR 13756 at commit 24edb5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I'm thinking about if it's possible to concentrate error checking logics at one place for table creation. For example, we check duplicated table column names at parser for SQL statement(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L918), at command for DataFrameWriter(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L77).

And some checks are only valid for SQL statement, e.g. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L925

It will be good if we can abstract the common pattern and put all error checking logic together, and have a individual test suite to test it.

@gatorsmile
Copy link
Member Author

Your suggestion is very good! Let me try it tonight. Thanks!


case p @ CreateHiveTableAsSelectLogicalPlan(table, child, allowExisting) =>
// Ensuring whether no duplicate name is used in table definition
checkDuplicates(child.output.map(_.name), s"table definition of ${table.identifier}")
Copy link
Member Author

@gatorsmile gatorsmile Jun 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PreWriteCheck is executed after conversion from CreateHiveTableAsSelectLogicalPlan to execution.CreateHiveTableAsSelectCommand. PreWriteCheck is unable to access the Hive package execution.CreateHiveTableAsSelectCommand. Thus, I have no clue how to move this into PreWriteCheck. Introduce a new rule?

Actually, in Hive, there is a stage called Semantic Analysis, which is done before Analyzer but after Parser. That stage is for checking these semantic errors. Not sure whether we should add a similar concept into Spark SQL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good idea, do you have more information about the Semantic Analysis phase? What kind of checks can be done there?

Copy link
Member Author

@gatorsmile gatorsmile Jun 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my previous comment is not accurate. In Hive, they just split what our Analyzer does into two phases: Semantic Analyzer and Logical Plan Generator. In Semantic Analyzer also resolves the relations by Catalog. Please ignore what I said above. : )

To answer your last question, let me post the error messages generated by semantic analyzer. The range of error codes from 10000 to 19999 is used by semantic analyzer:

  INVALID_TABLE(10001, "Table not found", "42S02"),
  INVALID_COLUMN(10002, "Invalid column reference"),
  INVALID_INDEX(10003, "Invalid index"),
  INVALID_TABLE_OR_COLUMN(10004, "Invalid table alias or column reference"),
  AMBIGUOUS_TABLE_OR_COLUMN(10005, "Ambiguous table alias or column reference"),
  INVALID_PARTITION(10006, "Partition not found"),
  AMBIGUOUS_COLUMN(10007, "Ambiguous column reference"),
  AMBIGUOUS_TABLE_ALIAS(10008, "Ambiguous table alias"),
  INVALID_TABLE_ALIAS(10009, "Invalid table alias"),
  NO_TABLE_ALIAS(10010, "No table alias"),
  INVALID_FUNCTION(10011, "Invalid function"),
  INVALID_FUNCTION_SIGNATURE(10012, "Function argument type mismatch"),
  INVALID_OPERATOR_SIGNATURE(10013, "Operator argument type mismatch"),
  INVALID_ARGUMENT(10014, "Wrong arguments"),
  INVALID_ARGUMENT_LENGTH(10015, "Arguments length mismatch", "21000"),
  INVALID_ARGUMENT_TYPE(10016, "Argument type mismatch"),
  INVALID_JOIN_CONDITION_1(10017, "Both left and right aliases encountered in JOIN"),
  INVALID_JOIN_CONDITION_2(10018, "Neither left nor right aliases encountered in JOIN"),
  INVALID_JOIN_CONDITION_3(10019, "OR not supported in JOIN currently"),
  INVALID_TRANSFORM(10020, "TRANSFORM with other SELECT columns not supported"),
  UNSUPPORTED_MULTIPLE_DISTINCTS(10022, "DISTINCT on different columns not supported" +
      " with skew in data"),
  NO_SUBQUERY_ALIAS(10023, "No alias for subquery"),
  NO_INSERT_INSUBQUERY(10024, "Cannot insert in a subquery. Inserting to table "),
  NON_KEY_EXPR_IN_GROUPBY(10025, "Expression not in GROUP BY key"),
  INVALID_XPATH(10026, "General . and [] operators are not supported"),
  INVALID_PATH(10027, "Invalid path"),
  ILLEGAL_PATH(10028, "Path is not legal"),
  INVALID_NUMERICAL_CONSTANT(10029, "Invalid numerical constant"),
  INVALID_ARRAYINDEX_TYPE(10030,
      "Not proper type for index of ARRAY. Currently, only integer type is supported"),
  INVALID_MAPINDEX_CONSTANT(10031, "Non-constant expression for map indexes not supported"),
  INVALID_MAPINDEX_TYPE(10032, "MAP key type does not match index expression type"),
  NON_COLLECTION_TYPE(10033, "[] not valid on non-collection types"),
  SELECT_DISTINCT_WITH_GROUPBY(10034, "SELECT DISTINCT and GROUP BY can not be in the same query"),
  COLUMN_REPEATED_IN_PARTITIONING_COLS(10035, "Column repeated in partitioning columns"),
  DUPLICATE_COLUMN_NAMES(10036, "Duplicate column name:"),
  INVALID_BUCKET_NUMBER(10037, "Bucket number should be bigger than zero"),
  COLUMN_REPEATED_IN_CLUSTER_SORT(10038, "Same column cannot appear in CLUSTER BY and SORT BY"),
  SAMPLE_RESTRICTION(10039, "Cannot SAMPLE on more than two columns"),
  SAMPLE_COLUMN_NOT_FOUND(10040, "SAMPLE column not found"),
  NO_PARTITION_PREDICATE(10041, "No partition predicate found"),
  INVALID_DOT(10042, ". Operator is only supported on struct or list of struct types"),
  INVALID_TBL_DDL_SERDE(10043, "Either list of columns or a custom serializer should be specified"),
  TARGET_TABLE_COLUMN_MISMATCH(10044,
      "Cannot insert into target table because column number/types are different"),
  TABLE_ALIAS_NOT_ALLOWED(10045, "Table alias not allowed in sampling clause"),
  CLUSTERBY_DISTRIBUTEBY_CONFLICT(10046, "Cannot have both CLUSTER BY and DISTRIBUTE BY clauses"),
  ORDERBY_DISTRIBUTEBY_CONFLICT(10047, "Cannot have both ORDER BY and DISTRIBUTE BY clauses"),
  CLUSTERBY_SORTBY_CONFLICT(10048, "Cannot have both CLUSTER BY and SORT BY clauses"),
  ORDERBY_SORTBY_CONFLICT(10049, "Cannot have both ORDER BY and SORT BY clauses"),
  CLUSTERBY_ORDERBY_CONFLICT(10050, "Cannot have both CLUSTER BY and ORDER BY clauses"),
  NO_LIMIT_WITH_ORDERBY(10051, "In strict mode, if ORDER BY is specified, "
      + "LIMIT must also be specified"),
  NO_CARTESIAN_PRODUCT(10052, "In strict mode, cartesian product is not allowed. "
      + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"),
  UNION_NOTIN_SUBQ(10053, "Top level UNION is not supported currently; "
      + "use a subquery for the UNION"),
  INVALID_INPUT_FORMAT_TYPE(10054, "Input format must implement InputFormat"),
  INVALID_OUTPUT_FORMAT_TYPE(10055, "Output Format must implement HiveOutputFormat, "
      + "otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat"),
  NO_VALID_PARTN(10056, "The query does not reference any valid partition. "
      + "To run this query, set hive.mapred.mode=nonstrict"),
  NO_OUTER_MAPJOIN(10057, "MAPJOIN cannot be performed with OUTER JOIN"),
  INVALID_MAPJOIN_HINT(10058, "All tables are specified as map-table for join"),
  INVALID_MAPJOIN_TABLE(10059, "Result of a union cannot be a map table"),
  NON_BUCKETED_TABLE(10060, "Sampling expression needed for non-bucketed table"),
  BUCKETED_NUMERATOR_BIGGER_DENOMINATOR(10061, "Numerator should not be bigger than "
      + "denominator in sample clause for table"),
  NEED_PARTITION_ERROR(10062, "Need to specify partition columns because the destination "
      + "table is partitioned"),
  CTAS_CTLT_COEXISTENCE(10063, "Create table command does not allow LIKE and AS-SELECT in "
      + "the same command"),
  LINES_TERMINATED_BY_NON_NEWLINE(10064, "LINES TERMINATED BY only supports "
      + "newline '\\n' right now"),
  CTAS_COLLST_COEXISTENCE(10065, "CREATE TABLE AS SELECT command cannot specify "
      + "the list of columns "
      + "for the target table"),
  CTLT_COLLST_COEXISTENCE(10066, "CREATE TABLE LIKE command cannot specify the list of columns for "
      + "the target table"),
  INVALID_SELECT_SCHEMA(10067, "Cannot derive schema from the select-clause"),
  CTAS_PARCOL_COEXISTENCE(10068, "CREATE-TABLE-AS-SELECT does not support "
      + "partitioning in the target table "),
  CTAS_MULTI_LOADFILE(10069, "CREATE-TABLE-AS-SELECT results in multiple file load"),
  CTAS_EXTTBL_COEXISTENCE(10070, "CREATE-TABLE-AS-SELECT cannot create external table"),
  INSERT_EXTERNAL_TABLE(10071, "Inserting into a external table is not allowed"),
  DATABASE_NOT_EXISTS(10072, "Database does not exist:"),
  TABLE_ALREADY_EXISTS(10073, "Table already exists:", "42S02"),
  COLUMN_ALIAS_ALREADY_EXISTS(10074, "Column alias already exists:", "42S02"),
  UDTF_MULTIPLE_EXPR(10075, "Only a single expression in the SELECT clause is "
      + "supported with UDTF's"),
  @Deprecated UDTF_REQUIRE_AS(10076, "UDTF's require an AS clause"),
  UDTF_NO_GROUP_BY(10077, "GROUP BY is not supported with a UDTF in the SELECT clause"),
  UDTF_NO_SORT_BY(10078, "SORT BY is not supported with a UDTF in the SELECT clause"),
  UDTF_NO_CLUSTER_BY(10079, "CLUSTER BY is not supported with a UDTF in the SELECT clause"),
  UDTF_NO_DISTRIBUTE_BY(10080, "DISTRUBTE BY is not supported with a UDTF in the SELECT clause"),
  UDTF_INVALID_LOCATION(10081, "UDTF's are not supported outside the SELECT clause, nor nested "
      + "in expressions"),
  UDTF_LATERAL_VIEW(10082, "UDTF's cannot be in a select expression when there is a lateral view"),
  UDTF_ALIAS_MISMATCH(10083, "The number of aliases supplied in the AS clause does not match the "
      + "number of columns output by the UDTF"),
  UDF_STATEFUL_INVALID_LOCATION(10084, "Stateful UDF's can only be invoked in the SELECT list"),
  LATERAL_VIEW_WITH_JOIN(10085, "JOIN with a LATERAL VIEW is not supported"),
  LATERAL_VIEW_INVALID_CHILD(10086, "LATERAL VIEW AST with invalid child"),
  OUTPUT_SPECIFIED_MULTIPLE_TIMES(10087, "The same output cannot be present multiple times: "),
  INVALID_AS(10088, "AS clause has an invalid number of aliases"),
  VIEW_COL_MISMATCH(10089, "The number of columns produced by the SELECT clause does not match the "
      + "number of column names specified by CREATE VIEW"),
  DML_AGAINST_VIEW(10090, "A view cannot be used as target table for LOAD or INSERT"),
  ANALYZE_VIEW(10091, "ANALYZE is not supported for views"),
  VIEW_PARTITION_TOTAL(10092, "At least one non-partitioning column must be present in view"),
  VIEW_PARTITION_MISMATCH(10093, "Rightmost columns in view output do not match "
      + "PARTITIONED ON clause"),
  PARTITION_DYN_STA_ORDER(10094, "Dynamic partition cannot be the parent of a static partition"),
  DYNAMIC_PARTITION_DISABLED(10095, "Dynamic partition is disabled. Either enable it by setting "
      + "hive.exec.dynamic.partition=true or specify partition column values"),
  DYNAMIC_PARTITION_STRICT_MODE(10096, "Dynamic partition strict mode requires at least one "
      + "static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict"),
  NONEXISTPARTCOL(10098, "Non-Partition column appears in the partition specification: "),
  UNSUPPORTED_TYPE(10099, "DATETIME type isn't supported yet. Please use "
      + "DATE or TIMESTAMP instead"),
  CREATE_NON_NATIVE_AS(10100, "CREATE TABLE AS SELECT cannot be used for a non-native table"),
  LOAD_INTO_NON_NATIVE(10101, "A non-native table cannot be used as target for LOAD"),
  LOCKMGR_NOT_SPECIFIED(10102, "Lock manager not specified correctly, set hive.lock.manager"),
  LOCKMGR_NOT_INITIALIZED(10103, "Lock manager could not be initialized, check hive.lock.manager "),
  LOCK_CANNOT_BE_ACQUIRED(10104, "Locks on the underlying objects cannot be acquired. "
      + "retry after some time"),
  ZOOKEEPER_CLIENT_COULD_NOT_BE_INITIALIZED(10105, "Check hive.zookeeper.quorum "
      + "and hive.zookeeper.client.port"),
  OVERWRITE_ARCHIVED_PART(10106, "Cannot overwrite an archived partition. " +
      "Unarchive before running this command"),
  ARCHIVE_METHODS_DISABLED(10107, "Archiving methods are currently disabled. " +
      "Please see the Hive wiki for more information about enabling archiving"),
  ARCHIVE_ON_MULI_PARTS(10108, "ARCHIVE can only be run on a single partition"),
  UNARCHIVE_ON_MULI_PARTS(10109, "ARCHIVE can only be run on a single partition"),
  ARCHIVE_ON_TABLE(10110, "ARCHIVE can only be run on partitions"),
  RESERVED_PART_VAL(10111, "Partition value contains a reserved substring"),
  OFFLINE_TABLE_OR_PARTITION(10113, "Query against an offline table or partition"),
  OUTERJOIN_USES_FILTERS(10114, "The query results could be wrong. " +
                         "Turn on hive.outerjoin.supports.filters"),
  NEED_PARTITION_SPECIFICATION(10115, "Table is partitioned and partition specification is needed"),
  INVALID_METADATA(10116, "The metadata file could not be parsed "),
  NEED_TABLE_SPECIFICATION(10117, "Table name could be determined; It should be specified "),
  PARTITION_EXISTS(10118, "Partition already exists"),
  TABLE_DATA_EXISTS(10119, "Table exists and contains data files"),
  INCOMPATIBLE_SCHEMA(10120, "The existing table is not compatible with the import spec. "),
  EXIM_FOR_NON_NATIVE(10121, "Export/Import cannot be done for a non-native table. "),
  INSERT_INTO_BUCKETIZED_TABLE(10122, "Bucketized tables do not support INSERT INTO:"),
  NO_COMPARE_BIGINT_STRING(10123, "In strict mode, comparing bigints and strings is not allowed, "
      + "it may result in a loss of precision. "
      + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"),
  NO_COMPARE_BIGINT_DOUBLE(10124, "In strict mode, comparing bigints and doubles is not allowed, "
      + "it may result in a loss of precision. "
      + "If you really want to perform the operation, set hive.mapred.mode=nonstrict"),
  PARTSPEC_DIFFER_FROM_SCHEMA(10125, "Partition columns in partition specification are "
      + "not the same as that defined in the table schema. "
      + "The names and orders have to be exactly the same."),
  PARTITION_COLUMN_NON_PRIMITIVE(10126, "Partition column must be of primitive type."),
  INSERT_INTO_DYNAMICPARTITION_IFNOTEXISTS(10127,
      "Dynamic partitions do not support IF NOT EXISTS. Specified partitions with value :"),
  UDAF_INVALID_LOCATION(10128, "Not yet supported place for UDAF"),
  DROP_PARTITION_NON_STRING_PARTCOLS_NONEQUALITY(10129,
    "Drop partitions for a non-string partition column is only allowed using equality"),
  ALTER_COMMAND_FOR_VIEWS(10131, "To alter a view you need to use the ALTER VIEW command."),
  ALTER_COMMAND_FOR_TABLES(10132, "To alter a base table you need to use the ALTER TABLE command."),
  ALTER_VIEW_DISALLOWED_OP(10133, "Cannot use this form of ALTER on a view"),
  ALTER_TABLE_NON_NATIVE(10134, "ALTER TABLE cannot be used for a non-native table"),
  SORTMERGE_MAPJOIN_FAILED(10135,
      "Sort merge bucketed join could not be performed. " +
      "If you really want to perform the operation, either set " +
      "hive.optimize.bucketmapjoin.sortedmerge=false, or set " +
      "hive.enforce.sortmergebucketmapjoin=false."),
  BUCKET_MAPJOIN_NOT_POSSIBLE(10136,
    "Bucketed mapjoin cannot be performed. " +
    "This can be due to multiple reasons: " +
    " . Join columns dont match bucketed columns. " +
    " . Number of buckets are not a multiple of each other. " +
    "If you really want to perform the operation, either remove the " +
    "mapjoin hint from your query or set hive.enforce.bucketmapjoin to false."),

  BUCKETED_TABLE_METADATA_INCORRECT(10141,
   "Bucketed table metadata is not correct. " +
    "Fix the metadata or don't use bucketed mapjoin, by setting " +
    "hive.enforce.bucketmapjoin to false."),

  JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer join(s) " +
      "cannot have more than 16 aliases"),

  INVALID_JDO_FILTER_EXPRESSION(10143, "Invalid expression for JDO filter"),

  SHOW_CREATETABLE_INDEX(10144, "SHOW CREATE TABLE does not support tables of type INDEX_TABLE."),
  ALTER_BUCKETNUM_NONBUCKETIZED_TBL(10145, "Table is not bucketized."),

  TRUNCATE_FOR_NON_MANAGED_TABLE(10146, "Cannot truncate non-managed table {0}.", true),
  TRUNCATE_FOR_NON_NATIVE_TABLE(10147, "Cannot truncate non-native table {0}.", true),
  PARTSPEC_FOR_NON_PARTITIONED_TABLE(10148, "Partition spec for non partitioned table {0}.", true),

  LOAD_INTO_STORED_AS_DIR(10195, "A stored-as-directories table cannot be used as target for LOAD"),
  ALTER_TBL_STOREDASDIR_NOT_SKEWED(10196, "This operation is only valid on skewed table."),
  ALTER_TBL_SKEWED_LOC_NO_LOC(10197, "Alter table skewed location doesn't have locations."),
  ALTER_TBL_SKEWED_LOC_NO_MAP(10198, "Alter table skewed location doesn't have location map."),
  SKEWED_TABLE_NO_COLUMN_NAME(10200, "No skewed column name."),
  SKEWED_TABLE_NO_COLUMN_VALUE(10201, "No skewed values."),
  SKEWED_TABLE_DUPLICATE_COLUMN_NAMES(10202,
      "Duplicate skewed column name:"),
  SKEWED_TABLE_INVALID_COLUMN(10203,
      "Invalid skewed column name:"),
  SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_1(10204,
      "Skewed column name is empty but skewed value is not."),
  SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_2(10205,
      "Skewed column value is empty but skewed name is not."),
  SKEWED_TABLE_SKEWED_COL_NAME_VALUE_MISMATCH_3(10206,
      "The number of skewed column names and the number of " +
      "skewed column values are different: "),
  ALTER_TABLE_NOT_ALLOWED_RENAME_SKEWED_COLUMN(10207,
      " is a skewed column. It's not allowed to rename skewed column"
          + " or change skewed column type."),
  HIVE_GROUPING_SETS_AGGR_NOMAPAGGR(10209,
    "Grouping sets aggregations (with rollups or cubes) are not allowed if map-side " +
    " aggregation is turned off. Set hive.map.aggr=true if you want to use grouping sets"),
  HIVE_GROUPING_SETS_AGGR_EXPRESSION_INVALID(10210,
    "Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function " +
    "parameters overlap with the aggregation functions columns"),

  HIVE_GROUPING_SETS_AGGR_NOFUNC(10211,
    "Grouping sets aggregations are not allowed if no aggregation function is presented"),

  HIVE_UNION_REMOVE_OPTIMIZATION_NEEDS_SUBDIRECTORIES(10212,
    "In order to use hive.optimize.union.remove, the hadoop version that you are using " +
    "should support sub-directories for tables/partitions. If that is true, set " +
    "hive.hadoop.supports.subdirectories to true. Otherwise, set hive.optimize.union.remove " +
    "to false"),

  HIVE_GROUPING_SETS_EXPR_NOT_IN_GROUPBY(10213,
    "Grouping sets expression is not in GROUP BY key"),
  INVALID_PARTITION_SPEC(10214, "Invalid partition spec specified"),
  ALTER_TBL_UNSET_NON_EXIST_PROPERTY(10215,
    "Please use the following syntax if not sure " +
    "whether the property existed or not:\n" +
    "ALTER TABLE tableName UNSET TBLPROPERTIES IF EXISTS (key1, key2, ...)\n"),
  ALTER_VIEW_AS_SELECT_NOT_EXIST(10216,
    "Cannot ALTER VIEW AS SELECT if view currently does not exist\n"),
  REPLACE_VIEW_WITH_PARTITION(10217,
    "Cannot replace a view with CREATE VIEW or REPLACE VIEW or " +
    "ALTER VIEW AS SELECT if the view has partitions\n"),
  EXISTING_TABLE_IS_NOT_VIEW(10218,
    "Existing table is not a view\n"),
  NO_SUPPORTED_ORDERBY_ALLCOLREF_POS(10219,
    "Position in ORDER BY is not supported when using SELECT *"),
  INVALID_POSITION_ALIAS_IN_GROUPBY(10220,
    "Invalid position alias in Group By\n"),
  INVALID_POSITION_ALIAS_IN_ORDERBY(10221,
    "Invalid position alias in Order By\n"),

  HIVE_GROUPING_SETS_THRESHOLD_NOT_ALLOWED_WITH_SKEW(10225,
    "An additional MR job is introduced since the number of rows created per input row " +
    "due to grouping sets is more than hive.new.job.grouping.set.cardinality. There is no need " +
    "to handle skew separately. set hive.groupby.skewindata to false."),
  HIVE_GROUPING_SETS_THRESHOLD_NOT_ALLOWED_WITH_DISTINCTS(10226,
    "An additional MR job is introduced since the cardinality of grouping sets " +
    "is more than hive.new.job.grouping.set.cardinality. This functionality is not supported " +
    "with distincts. Either set hive.new.job.grouping.set.cardinality to a high number " +
    "(higher than the number of rows per input row due to grouping sets in the query), or " +
    "rewrite the query to not use distincts."),

  OPERATOR_NOT_ALLOWED_WITH_MAPJOIN(10227,
    "Not all clauses are supported with mapjoin hint. Please remove mapjoin hint."),

  ANALYZE_TABLE_NOSCAN_NON_NATIVE(10228, "ANALYZE TABLE NOSCAN cannot be used for "
      + "a non-native table"),

  ANALYZE_TABLE_PARTIALSCAN_NON_NATIVE(10229, "ANALYZE TABLE PARTIALSCAN cannot be used for "
      + "a non-native table"),
  ANALYZE_TABLE_PARTIALSCAN_NON_RCFILE(10230, "ANALYZE TABLE PARTIALSCAN doesn't "
      + "support non-RCfile. "),
  ANALYZE_TABLE_PARTIALSCAN_EXTERNAL_TABLE(10231, "ANALYZE TABLE PARTIALSCAN "
      + "doesn't support external table: "),
  ANALYZE_TABLE_PARTIALSCAN_AGGKEY(10232, "Analyze partialscan command "
            + "fails to construct aggregation for the partition "),
  ANALYZE_TABLE_PARTIALSCAN_AUTOGATHER(10233, "Analyze partialscan is not allowed " +
            "if hive.stats.autogather is set to false"),
  PARTITION_VALUE_NOT_CONTINUOUS(10234, "Parition values specifed are not continuous." +
            " A subpartition value is specified without specififying the parent partition's value"),
  TABLES_INCOMPATIBLE_SCHEMAS(10235, "Tables have incompatible schemas and their partitions " +
            " cannot be exchanged."),

  TRUNCATE_COLUMN_INDEXED_TABLE(10236, "Can not truncate columns from table with indexes"),
  TRUNCATE_COLUMN_NOT_RC(10237, "Only RCFileFormat supports column truncation."),
  TRUNCATE_COLUMN_ARCHIVED(10238, "Column truncation cannot be performed on archived partitions."),
  TRUNCATE_BUCKETED_COLUMN(10239,
      "A column on which a partition/table is bucketed cannot be truncated."),
  TRUNCATE_LIST_BUCKETED_COLUMN(10240,
      "A column on which a partition/table is list bucketed cannot be truncated."),

  TABLE_NOT_PARTITIONED(10241, "Table {0} is not a partitioned table", true),
  DATABSAE_ALREADY_EXISTS(10242, "Database {0} already exists", true),
  CANNOT_REPLACE_COLUMNS(10243, "Replace columns is not supported for table {0}. SerDe may be incompatible.", true),
  BAD_LOCATION_VALUE(10244, "{0}  is not absolute.  Please specify a complete absolute uri."),
  UNSUPPORTED_ALTER_TBL_OP(10245, "{0} alter table options is not supported"),
  INVALID_BIGTABLE_MAPJOIN(10246, "{0} table chosen for streaming is not valid", true),
  MISSING_OVER_CLAUSE(10247, "Missing over clause for function : "),
  PARTITION_SPEC_TYPE_MISMATCH(10248, "Cannot add partition column {0} of type {1} as it cannot be converted to type {2}", true),
  UNSUPPORTED_SUBQUERY_EXPRESSION(10249, "Unsupported SubQuery Expression"),
  INVALID_SUBQUERY_EXPRESSION(10250, "Invalid SubQuery expression"),

  INVALID_HDFS_URI(10251, "{0} is not a hdfs uri", true),
  INVALID_DIR(10252, "{0} is not a directory", true),
  NO_VALID_LOCATIONS(10253, "Could not find any valid location to place the jars. " +
      "Please update hive.jar.directory or hive.user.install.directory with a valid location", false),
  UNSUPPORTED_AUTHORIZATION_PRINCIPAL_TYPE_GROUP(10254,
      "Principal type GROUP is not supported in this authorization setting", "28000"),
  INVALID_TABLE_NAME(10255, "Invalid table name {0}", true),
  INSERT_INTO_IMMUTABLE_TABLE(10256, "Inserting into a non-empty immutable table is not allowed"),
  UNSUPPORTED_AUTHORIZATION_RESOURCE_TYPE_GLOBAL(10257,
      "Resource type GLOBAL is not supported in this authorization setting", "28000"),
  UNSUPPORTED_AUTHORIZATION_RESOURCE_TYPE_COLUMN(10258,
      "Resource type COLUMN is not supported in this authorization setting", "28000"),

  TXNMGR_NOT_SPECIFIED(10260, "Transaction manager not specified correctly, " +
      "set hive.txn.manager"),
  TXNMGR_NOT_INSTANTIATED(10261, "Transaction manager could not be " +
      "instantiated, check hive.txn.manager"),
  TXN_NO_SUCH_TRANSACTION(10262, "No record of transaction {0} could be found, " +
      "may have timed out", true),
  TXN_ABORTED(10263, "Transaction manager has aborted the transaction {0}.", true),
  DBTXNMGR_REQUIRES_CONCURRENCY(10264,
      "To use DbTxnManager you must set hive.support.concurrency=true"),
  TXNMGR_NOT_ACID(10265, "This command is not allowed on an ACID table {0}.{1} with a non-ACID transaction manager", true),

  LOCK_NO_SUCH_LOCK(10270, "No record of lock {0} could be found, " +
      "may have timed out", true),
  LOCK_REQUEST_UNSUPPORTED(10271, "Current transaction manager does not " +
      "support explicit lock requests.  Transaction manager:  "),

  METASTORE_COMMUNICATION_FAILED(10280, "Error communicating with the " +
      "metastore"),
  METASTORE_COULD_NOT_INITIATE(10281, "Unable to initiate connection to the " +
      "metastore."),
  INVALID_COMPACTION_TYPE(10282, "Invalid compaction type, supported values are 'major' and " +
      "'minor'"),
  NO_COMPACTION_PARTITION(10283, "You must specify a partition to compact for partitioned tables"),
  TOO_MANY_COMPACTION_PARTITIONS(10284, "Compaction can only be requested on one partition at a " +
      "time."),
  DISTINCT_NOT_SUPPORTED(10285, "Distinct keyword is not support in current context"),

  UPDATEDELETE_PARSE_ERROR(10290, "Encountered parse error while parsing rewritten update or " +
      "delete query"),
  UPDATEDELETE_IO_ERROR(10291, "Encountered I/O error while parsing rewritten update or " +
      "delete query"),
  UPDATE_CANNOT_UPDATE_PART_VALUE(10292, "Updating values of partition columns is not supported"),
  INSERT_CANNOT_CREATE_TEMP_FILE(10293, "Unable to create temp file for insert values "),
  ACID_OP_ON_NONACID_TXNMGR(10294, "Attempt to do update or delete using transaction manager that" +
      " does not support these operations."),
  NO_INSERT_OVERWRITE_WITH_ACID(10295, "INSERT OVERWRITE not allowed on table with OutputFormat " +
      "that implements AcidOutputFormat while transaction manager that supports ACID is in use"),
  VALUES_TABLE_CONSTRUCTOR_NOT_SUPPORTED(10296,
      "Values clause with table constructor not yet supported"),
  ACID_OP_ON_NONACID_TABLE(10297, "Attempt to do update or delete on table {0} that does not use " +
      "an AcidOutputFormat or is not bucketed", true),
  ACID_NO_SORTED_BUCKETS(10298, "ACID insert, update, delete not supported on tables that are " +
      "sorted, table {0}", true),
  ALTER_TABLE_TYPE_PARTIAL_PARTITION_SPEC_NO_SUPPORTED(10299,
      "Alter table partition type {0} does not allow partial partition spec", true),
  ALTER_TABLE_PARTITION_CASCADE_NOT_SUPPORTED(10300,
      "Alter table partition type {0} does not support cascade", true),

  DROP_NATIVE_FUNCTION(10301, "Cannot drop native function"),
  UPDATE_CANNOT_UPDATE_BUCKET_VALUE(10302, "Updating values of bucketing columns is not supported.  Column {0}.", true),
  IMPORT_INTO_STRICT_REPL_TABLE(10303,"Non-repl import disallowed against table that is a destination of replication."),
  CTAS_LOCATION_NONEMPTY(10304, "CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory."),
  CTAS_CREATES_VOID_TYPE(10305, "CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: "),
  TBL_SORTED_NOT_BUCKETED(10306, "Destination table {0} found to be sorted but not bucketed.", true),
  //{2} should be lockid
  LOCK_ACQUIRE_TIMEDOUT(10307, "Lock acquisition for {0} timed out after {1}ms.  {2}", true),
  COMPILE_LOCK_TIMED_OUT(10308, "Attempt to acquire compile lock timed out.", true),
  CANNOT_CHANGE_SERDE(10309, "Changing SerDe (from {0}) is not supported for table {1}. File format may be incompatible", true),
  CANNOT_CHANGE_FILEFORMAT(10310, "Changing file format (from {0}) is not supported for table {1}", true),
  CANNOT_REORDER_COLUMNS(10311, "Reordering columns is not supported for table {0}. SerDe may be incompatible", true),
  CANNOT_CHANGE_COLUMN_TYPE(10312, "Changing from type {0} to {1} is not supported for column {2}. SerDe may be incompatible", true),
  REPLACE_CANNOT_DROP_COLUMNS(10313, "Replacing columns cannot drop columns for table {0}. SerDe may be incompatible", true),
  REPLACE_UNSUPPORTED_TYPE_CONVERSION(10314, "Replacing columns with unsupported type conversion (from {0} to {1}) for column {2}. SerDe may be incompatible", true),
  HIVE_GROUPING_SETS_AGGR_NOMAPAGGR_MULTIGBY(10315,
      "Grouping sets aggregations (with rollups or cubes) are not allowed when " +
      "HIVEMULTIGROUPBYSINGLEREDUCER is turned on. Set hive.multigroupby.singlereducer=false if you want to use grouping sets"),
  CANNOT_RETRIEVE_TABLE_METADATA(10316, "Error while retrieving table metadata"),
  CANNOT_DROP_INDEX(10317, "Error while dropping index"),
  INVALID_AST_TREE(10318, "Internal error : Invalid AST"),
  ERROR_SERIALIZE_METASTORE(10319, "Error while serializing the metastore objects"),
  IO_ERROR(10320, "Error while peforming IO operation "),
  ERROR_SERIALIZE_METADATA(10321, "Error while serializing the metadata"),
  INVALID_LOAD_TABLE_FILE_WORK(10322, "Invalid Load Table Work or Load File Work"),
  CLASSPATH_ERROR(10323, "Classpath error"),
  IMPORT_SEMANTIC_ERROR(10324, "Import Semantic Analyzer Error"),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin @cloud-fan @liancheng @yhuai Do you think we can open an umbrallel JIRA for the whole community to track whether the same/similar error messages should be issued by Spark SQL? That could help us find all the potential holes and improve the code quality?

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61154 has finished for PR 13756 at commit 53417f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • public class JavaPackage
    • case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61153 has finished for PR 13756 at commit c0e7e0c.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61179 has finished for PR 13756 at commit 53417f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • public class JavaPackage
    • case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2016

Test build #61634 has finished for PR 13756 at commit 53417f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • public class JavaPackage
    • case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode

@gatorsmile
Copy link
Member Author

ping @liancheng @cloud-fan

/**
* Create a table, returning either a [[CreateTableCommand]] or a
* [[CreateHiveTableAsSelectLogicalPlan]].
* Create a table, returning either a [[CreateTableCommand]], a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove "either".


case c: CreateTableCommand =>
val allColNamesInSchema = c.table.schema.map(_.name)
val colNames = allColNamesInSchema.diff(c.table.partitionColumnNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to do case sensitive comparison here?

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61770 has finished for PR 13756 at commit 08b5374.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61785 has finished for PR 13756 at commit 08b5374.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62182 has finished for PR 13756 at commit 08b5374.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @liancheng @cloud-fan

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 23, 2016

Test build #62746 has finished for PR 13756 at commit 08b5374.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 28, 2016

Test build #62985 has finished for PR 13756 at commit 08b5374.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

This is part of #14482. Close it now

@gatorsmile gatorsmile closed this Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants