Skip to content

[BUG] CREATE INDEX IF NOT EXISTS record_index throws "Index already exists" — ignoreIfExists flag is silently dropped #18691

@prashantwason

Description

@prashantwason

Bug Description

What happened:
CREATE INDEX IF NOT EXISTS record_index ON <table> (<record_key_col>) throws
HoodieMetadataIndexException: Index already exists: record_index when the index
already exists. The IF NOT EXISTS clause has no effect.

What you expected:
With IF NOT EXISTS, the command should be a no-op when the index already
exists — matching standard SQL semantics and matching the behavior implied by
the ignoreIfExists: Boolean field on CreateIndexCommand.

Steps to reproduce:

  1. Create a Hudi COW table with a record-key column (e.g. uuid).
  2. CREATE INDEX record_index ON tbl (uuid) — succeeds.
  3. CREATE INDEX IF NOT EXISTS record_index ON tbl (uuid) — throws
    HoodieMetadataIndexException: Index already exists: record_index.

Root cause

CreateIndexCommand in
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala
parses an ignoreIfExists: Boolean field from the SQL but never propagates it
to HoodieSparkIndexClient.create(...):

} else if (indexName.equals(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX)) {
  ValidationUtils.checkArgument(...)
  new HoodieSparkIndexClient(sparkSession).create(metaClient, indexName,
      HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, columnsMap,
      options.asJava, table.properties.asJava)
  // ignoreIfExists is dropped here
}

HoodieSparkIndexClient.create in
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.java
has no ignoreIfExists parameter at all, and createRecordIndex unconditionally
throws when the index exists:

String fullIndexName = PARTITION_NAME_RECORD_INDEX;
if (indexExists(metaClient, fullIndexName)) {
  throw new HoodieMetadataIndexException("Index already exists: " + userIndexName);
}

The same gap exists for the column_stats/bloom_filters/secondary-index branches —
none of the HoodieSparkIndexClient(...).create(...) call sites in
CreateIndexCommand.run pass through ignoreIfExists.

Suggested fix

  1. Add an ignoreIfExists: boolean parameter to HoodieSparkIndexClient.create(...).
  2. Pass it through from every branch in CreateIndexCommand.run.
  3. In createRecordIndex and createExpressionOrSecondaryIndex, return early
    (instead of throwing) when the index already exists and ignoreIfExists == true.

Notes

  • The expression/secondary-index path (createExpressionOrSecondaryIndex,
    HoodieSparkIndexClient.java:155-159) already silently skips re-registration
    when the index exists, so observable behavior between record_index and
    expression/secondary indexes already differs. The fix is a good opportunity
    to unify behavior across both paths.
  • DROP INDEX IF EXISTS works correctly today: the ignoreIfNotExists: Boolean
    field on DropIndexCommand IS propagated to
    HoodieSparkIndexClient.drop(metaClient, indexName, ignoreIfNotExists).
    CREATE INDEX IF NOT EXISTS is the missing symmetric path.

Environment

Hudi version: 1.x (verified on 1.2; affects all releases where record_index DDL exists)
Query engine: Spark 3.3
Relevant configs: standard MDT-enabled COW table

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions