Tips before filing an issue
- Have you gone through our FAQs?
- Yes, this is a code-level bug.
Describe the problem you faced
org.apache.hudi.DefaultSource has two read-side overloads of createRelation:
- 2-arg
createRelation(sqlContext, parameters) — wraps the body in try { … } catch { case _: HoodieSchemaNotFoundException => new EmptyRelation(…) }. This catch was added in HUDI-7147 / PR #10689 precisely so that a schema-less Hudi table (no commits / commit metadata deleted / legacy schemaless layout) does not explode at query analysis time.
- 3-arg
createRelation(sqlContext, optParams, schema) — calls DefaultSource.createRelation(sqlContext, metaClient, schema, options.toMap) directly, without the same catch.
Spark's DataSource.resolveRelation() chooses which overload to invoke based on whether a user-supplied schema is present:
```scala
case (dataSource: SchemaRelationProvider, Some(schema)) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions, schema)
case (dataSource: RelationProvider, _) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
```
Any read path that supplies a schema (e.g. `spark.read.schema(s).format("hudi").load(path)`, or HMS-catalog resolution that already knows the schema) bypasses the 2-arg catch and surfaces `HoodieSchemaNotFoundException` directly.
To Reproduce
- Create a Hudi table with one insert commit, then delete the only completed `.commit` file in the timeline (or otherwise produce a layout where `TableSchemaResolver` cannot resolve a schema).
- `spark.read.format("hudi").load(basePath)` → returns empty DataFrame (works because of the 2-arg catch — see existing test `TestCOWDataSource.testReadOfAnEmptyTable`).
- `spark.read.schema(someSchema).format("hudi").load(basePath)` → throws `org.apache.hudi.exception.HoodieSchemaNotFoundException: No schema found for table at `.
The same scenario also reproduces when a Spark SQL query resolves a Hudi table via `HiveMetastoreCatalog` and the catalog already supplies the schema.
Expected behavior
Both overloads should treat a schema-less Hudi table identically: return an `EmptyRelation` rather than propagating `HoodieSchemaNotFoundException`. Behavior should not depend on whether the caller supplied a schema.
Environment Description
- Hudi version: master (also reproduces on 0.x and 1.x)
- Spark version: 3.x
- Hive version: any
- Hadoop version: any
- Storage (HDFS/S3/GCS..): any
- Running on Docker? (yes/no): no
Additional context
The fix is a small mirror of the existing 2-arg catch on the 3-arg overload. PR follows.
Stacktrace
```
org.apache.hudi.exception.HoodieSchemaNotFoundException: No schema found for table at
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:...)
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:...)
at org.apache.hudi.HoodieBaseRelation.(HoodieBaseRelation.scala:...)
at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:...)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:137) ← 3-arg overload, no catch
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:...)
...
```
Tips before filing an issue
Describe the problem you faced
org.apache.hudi.DefaultSourcehas two read-side overloads ofcreateRelation:createRelation(sqlContext, parameters)— wraps the body intry { … } catch { case _: HoodieSchemaNotFoundException => new EmptyRelation(…) }. This catch was added in HUDI-7147 / PR #10689 precisely so that a schema-less Hudi table (no commits / commit metadata deleted / legacy schemaless layout) does not explode at query analysis time.createRelation(sqlContext, optParams, schema)— callsDefaultSource.createRelation(sqlContext, metaClient, schema, options.toMap)directly, without the same catch.Spark's
DataSource.resolveRelation()chooses which overload to invoke based on whether a user-supplied schema is present:```scala
case (dataSource: SchemaRelationProvider, Some(schema)) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions, schema)
case (dataSource: RelationProvider, _) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
```
Any read path that supplies a schema (e.g. `spark.read.schema(s).format("hudi").load(path)`, or HMS-catalog resolution that already knows the schema) bypasses the 2-arg catch and surfaces `HoodieSchemaNotFoundException` directly.
To Reproduce
The same scenario also reproduces when a Spark SQL query resolves a Hudi table via `HiveMetastoreCatalog` and the catalog already supplies the schema.
Expected behavior
Both overloads should treat a schema-less Hudi table identically: return an `EmptyRelation` rather than propagating `HoodieSchemaNotFoundException`. Behavior should not depend on whether the caller supplied a schema.
Environment Description
Additional context
The fix is a small mirror of the existing 2-arg catch on the 3-arg overload. PR follows.
Stacktrace
```
org.apache.hudi.exception.HoodieSchemaNotFoundException: No schema found for table at
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:...)
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:...)
at org.apache.hudi.HoodieBaseRelation.(HoodieBaseRelation.scala:...)
at org.apache.hudi.DefaultSource$.createRelation(DefaultSource.scala:...)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:137) ← 3-arg overload, no catch
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:...)
...
```