Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions dask_bigquery/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from contextlib import contextmanager
from functools import partial
from typing import List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I'm by no means a type annotation expert, but from reviewing dask/distributed#5328, I think the recommend approach for a list would be to add from __future__ import annotations and then just use the built-in list in the type annotation (i.e. list[str]). This isn't meant to be a blocking comment as what you have here is valid. Just meant as an FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, good call..I'm used to list[] for 3.9 but forgot about the __future__ option


import pandas as pd
import pyarrow
Expand Down Expand Up @@ -88,7 +89,8 @@ def read_gbq(
project_id: str,
dataset_id: str,
table_id: str,
row_filter="",
row_filter: str = "",
columns: List[str] = None,
read_kwargs: dict = None,
):
"""Read table as dask dataframe using BigQuery Storage API via Arrow format.
Expand All @@ -104,6 +106,8 @@ def read_gbq(
BigQuery table within dataset
row_filter: str
SQL text filtering statement to pass to `row_restriction`
columns: list[str]
list of columns to load from the table
read_kwargs: dict
kwargs to pass to read_rows()

Expand All @@ -124,7 +128,7 @@ def make_create_read_session_request(row_filter=""):
read_session=bigquery_storage.types.ReadSession(
data_format=bigquery_storage.types.DataFormat.ARROW,
read_options=bigquery_storage.types.ReadSession.TableReadOptions(
row_restriction=row_filter,
row_restriction=row_filter, selected_fields=columns
),
table=table_ref.to_bqstorage(),
),
Expand Down
13 changes: 13 additions & 0 deletions dask_bigquery/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,16 @@ def test_read_kwargs(dataset, client):

with pytest.raises(Exception, match="504 Deadline Exceeded"):
ddf.compute()


def test_read_columns(df, dataset, client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this test looks great. One small suggestions: could we add

assert df.shape[1] > 1

to ensure that the original DataFrame has more than one column? Otherwise, in the future the example DataFrame could be updated to only have a single "name" column and this test would still pass (this is unlikely, but possible)

project_id, dataset_id, table_id = dataset
columns = ["name"]
ddf = read_gbq(
project_id=project_id,
dataset_id=dataset_id,
table_id=table_id,
columns=columns,
)

assert list(ddf.columns) == columns