Skip to content

Commit 7d1e66d

Browse files
authored
Fix dataframe queries failing on empty datasets (#11846)
### Related - Fixes RR-2819 ### What Title. This was due to matching selectors to actual schema, which involved sorbet stuff, which failed on missing `RowId` columns. Fixed by short-circuiting the empty dataset case. Also added a test which, like many more, should have been here in the first place 🤦🏻
1 parent c62c77f commit 7d1e66d

File tree

2 files changed

+28
-0
lines changed

2 files changed

+28
-0
lines changed

crates/store/re_datafusion/src/dataframe_query_common.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,12 @@ fn compute_schema_for_query(
308308
dataset_schema: &Schema,
309309
query_expression: &QueryExpression,
310310
) -> Result<SchemaRef, DataFusionError> {
311+
// Short circuit for empty datasets. Needed because `ChunkColumnDescriptors::try_from_arrow_fields`
312+
// needs row ids, which we only have for non-empty datasets.
313+
if dataset_schema.fields.is_empty() {
314+
return Ok(Arc::new(Schema::empty()));
315+
}
316+
311317
// Schema returned from `get_dataset_schema` does not match the required ChunkColumnDescriptors ordering
312318
// which is row id, then time, then data. We don't need perfect ordering other than that.
313319
let mut fields = dataset_schema
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING
4+
5+
import pytest
6+
7+
if TYPE_CHECKING:
8+
from .conftest import ServerInstance
9+
10+
11+
# TODO(ab): quite obviously, there needs to be many more tests here.
12+
13+
14+
@pytest.mark.parametrize("index", [None, "does_not_exist"])
15+
def test_dataframe_query_empty_dataset(index: str | None, server_instance: ServerInstance) -> None:
16+
client = server_instance.client
17+
18+
ds = client.create_dataset("empty_dataset")
19+
20+
df = ds.dataframe_query_view(index=index, contents="/**").df()
21+
22+
assert df.count() == 0

0 commit comments

Comments
 (0)