Bug Description
What happened:
CREATE INDEX IF NOT EXISTS record_index ON <table> (<record_key_col>) throws
HoodieMetadataIndexException: Index already exists: record_index when the index
already exists. The IF NOT EXISTS clause has no effect.
What you expected:
With IF NOT EXISTS, the command should be a no-op when the index already
exists — matching standard SQL semantics and matching the behavior implied by
the ignoreIfExists: Boolean field on CreateIndexCommand.
Steps to reproduce:
- Create a Hudi COW table with a record-key column (e.g.
uuid).
CREATE INDEX record_index ON tbl (uuid) — succeeds.
CREATE INDEX IF NOT EXISTS record_index ON tbl (uuid) — throws
HoodieMetadataIndexException: Index already exists: record_index.
Root cause
CreateIndexCommand in
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala
parses an ignoreIfExists: Boolean field from the SQL but never propagates it
to HoodieSparkIndexClient.create(...):
} else if (indexName.equals(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX)) {
ValidationUtils.checkArgument(...)
new HoodieSparkIndexClient(sparkSession).create(metaClient, indexName,
HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, columnsMap,
options.asJava, table.properties.asJava)
// ignoreIfExists is dropped here
}
HoodieSparkIndexClient.create in
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.java
has no ignoreIfExists parameter at all, and createRecordIndex unconditionally
throws when the index exists:
String fullIndexName = PARTITION_NAME_RECORD_INDEX;
if (indexExists(metaClient, fullIndexName)) {
throw new HoodieMetadataIndexException("Index already exists: " + userIndexName);
}
The same gap exists for the column_stats/bloom_filters/secondary-index branches —
none of the HoodieSparkIndexClient(...).create(...) call sites in
CreateIndexCommand.run pass through ignoreIfExists.
Suggested fix
- Add an
ignoreIfExists: boolean parameter to HoodieSparkIndexClient.create(...).
- Pass it through from every branch in
CreateIndexCommand.run.
- In
createRecordIndex and createExpressionOrSecondaryIndex, return early
(instead of throwing) when the index already exists and ignoreIfExists == true.
Notes
- The expression/secondary-index path (
createExpressionOrSecondaryIndex,
HoodieSparkIndexClient.java:155-159) already silently skips re-registration
when the index exists, so observable behavior between record_index and
expression/secondary indexes already differs. The fix is a good opportunity
to unify behavior across both paths.
DROP INDEX IF EXISTS works correctly today: the ignoreIfNotExists: Boolean
field on DropIndexCommand IS propagated to
HoodieSparkIndexClient.drop(metaClient, indexName, ignoreIfNotExists).
CREATE INDEX IF NOT EXISTS is the missing symmetric path.
Environment
Hudi version: 1.x (verified on 1.2; affects all releases where record_index DDL exists)
Query engine: Spark 3.3
Relevant configs: standard MDT-enabled COW table
Bug Description
What happened:
CREATE INDEX IF NOT EXISTS record_index ON <table> (<record_key_col>)throwsHoodieMetadataIndexException: Index already exists: record_indexwhen the indexalready exists. The
IF NOT EXISTSclause has no effect.What you expected:
With
IF NOT EXISTS, the command should be a no-op when the index alreadyexists — matching standard SQL semantics and matching the behavior implied by
the
ignoreIfExists: Booleanfield onCreateIndexCommand.Steps to reproduce:
uuid).CREATE INDEX record_index ON tbl (uuid)— succeeds.CREATE INDEX IF NOT EXISTS record_index ON tbl (uuid)— throwsHoodieMetadataIndexException: Index already exists: record_index.Root cause
CreateIndexCommandinhudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scalaparses an
ignoreIfExists: Booleanfield from the SQL but never propagates itto
HoodieSparkIndexClient.create(...):HoodieSparkIndexClient.createinhudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.javahas no
ignoreIfExistsparameter at all, andcreateRecordIndexunconditionallythrows when the index exists:
The same gap exists for the column_stats/bloom_filters/secondary-index branches —
none of the
HoodieSparkIndexClient(...).create(...)call sites inCreateIndexCommand.runpass throughignoreIfExists.Suggested fix
ignoreIfExists: booleanparameter toHoodieSparkIndexClient.create(...).CreateIndexCommand.run.createRecordIndexandcreateExpressionOrSecondaryIndex, return early(instead of throwing) when the index already exists and
ignoreIfExists == true.Notes
createExpressionOrSecondaryIndex,HoodieSparkIndexClient.java:155-159) already silently skips re-registrationwhen the index exists, so observable behavior between record_index and
expression/secondary indexes already differs. The fix is a good opportunity
to unify behavior across both paths.
DROP INDEX IF EXISTSworks correctly today: theignoreIfNotExists: Booleanfield on
DropIndexCommandIS propagated toHoodieSparkIndexClient.drop(metaClient, indexName, ignoreIfNotExists).CREATE INDEX IF NOT EXISTSis the missing symmetric path.Environment
Hudi version: 1.x (verified on 1.2; affects all releases where record_index DDL exists)
Query engine: Spark 3.3
Relevant configs: standard MDT-enabled COW table