Skip to content

[REVIEW] Add GroupBy.dtypes#12783

Merged
rapids-bot[bot] merged 8 commits intorapidsai:branch-23.04from
galipremsagar:groupby_dtypes
Feb 16, 2023
Merged

[REVIEW] Add GroupBy.dtypes#12783
rapids-bot[bot] merged 8 commits intorapidsai:branch-23.04from
galipremsagar:groupby_dtypes

Conversation

@galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 15, 2023

Description

This PR adds dtypes property to GroupBy, this will also fix some upstream dask breaking changes introduced in: dask/dask#9889

Issue was discovered in: #12768 (comment)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Feb 15, 2023
@galipremsagar galipremsagar self-assigned this Feb 15, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner February 15, 2023 18:06
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 15, 2023
}
)
df.index = index
return df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much more simple than the pandas implementation, but I guess it makes sense.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion to simplify, but otherwise LGTM.

galipremsagar and others added 2 commits February 15, 2023 12:51
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Co-authored-by: Ashwin Srinath <3190405+shwina@users.noreply.github.com>
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Feb 15, 2023
@galipremsagar
Copy link
Contributor Author

Currently blocked by an unrelated error:

2023-02-15T00:56:38.2183067Z [32m┌────────────────────────┐[0m
2023-02-15T00:56:38.2183513Z [32m|    pytest custreamz    |[0m
2023-02-15T00:56:38.2184134Z [32m└────────────────────────┘[0m
2023-02-15T00:56:38.2184156Z 
2023-02-15T00:56:38.2184513Z /__w/cudf/cudf/python/custreamz /__w/cudf/cudf
2023-02-15T00:56:39.2568849Z ImportError while loading conftest '/__w/cudf/cudf/python/custreamz/custreamz/tests/conftest.py'.
2023-02-15T00:56:39.2570733Z custreamz/__init__.py:3: in <module>
2023-02-15T00:56:39.2571402Z     from .kafka import Consumer
2023-02-15T00:56:39.2572094Z custreamz/kafka.py:3: in <module>
2023-02-15T00:56:39.2572830Z     from cudf_kafka._lib.kafka import KafkaDatasource
2023-02-15T00:56:39.2574920Z E   ImportError: /opt/conda/envs/test/lib/python3.8/site-packages/cudf_kafka/_lib/kafka.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3fmt2v96detail18thousands_sep_implIcEENS1_20thousands_sep_resultIT_EENS1_10locale_refE

Discussing offline to resolve this.

@vyasr
Copy link
Contributor

vyasr commented Feb 15, 2023

In the interest of unblocking CI, I've made some small changes to patch this issue. I'll need to do a bit of a deeper investigation to figure out where cudf (or perhaps arrow) is leaking this dependency from; I'm not 100% convinced that it should be needed, but I could be mistaken (in which case my current patch won't need to be modified).

@galipremsagar
Copy link
Contributor Author

CI passed, merging this PR now

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit d787ff2 into rapidsai:branch-23.04 Feb 16, 2023
rapids-bot bot pushed a commit that referenced this pull request Feb 16, 2023
The changes to spdlog/fmt packaging in rmm caused an undefined symbol issue in cudf_kafka. To unblock CI in #12783, I simply added fmt to the list of required libraries for the Python package (see caf7adf and [the explanation](#12783 (comment))). The underlying issue turns out to be that by default usage of fmt results in the dependent assuming that the headers are compiled in not header-only mode. When using CMake to manage the build, fmt relies on use of the appropriate target `fmt::fmt-header-only` to configure the build such that anything using fmt knows to use it in header-only mode, which sets the `FMT_HEADER_ONLY` preprocessor macro under the hood. Since the Python cudf_kafka package is not built using CMake, however, this information was not propagated to its build from its dependencies (libcudf/libcudf_kafka). As a result, it was compiled expecting an external definition of some fmt symbols rather than inlining them. This PR removes the undesirable library dependency on fmt introduced in #12783 in favor of properly telling fmt to expect inlined symbols.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #12796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants