Skip to content

Change make_strings_children to return uvector#15171

Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-24.04from
davidwendt:msc-return-uvector
Mar 1, 2024
Merged

Change make_strings_children to return uvector#15171
rapids-bot[bot] merged 4 commits intorapidsai:branch-24.04from
davidwendt:msc-return-uvector

Conversation

@davidwendt
Copy link
Copy Markdown
Contributor

Description

Changes the cudf::strings::detail::make_strings_children utility to return a rmm::device_uvector<char> instead of a chars column. This further helps enable large strings support by not storing chars in a column.
This is an internal utility and so is non-breaking for any public APIs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 28, 2024
@davidwendt davidwendt self-assigned this Feb 28, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 28, 2024
@davidwendt davidwendt marked this pull request as ready for review February 28, 2024 20:29
@davidwendt davidwendt requested a review from a team as a code owner February 28, 2024 20:29
Copy link
Copy Markdown
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great with a non-blocking comment on the function return type.

// Now build the chars column
std::unique_ptr<column> chars_column =
create_chars_child_column(static_cast<size_type>(bytes), stream, mr);
rmm::device_uvector<char> chars(bytes, stream, mr);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably returning a unique pointer of uvector according to the developer guide,

libcudf functions typically take views as input (`column_view` or `table_view`)
and produce `unique_ptr`s to owning objects as output. For example,
```c++
std::unique_ptr<table> sort(table_view const& input);
```

I also noticed #14202 changed the input argument from unique ptr to rvalue reference for make_strings_column, which makes the ownership information obscure IMO.

}

return std::pair(std::move(offsets_column), std::move(chars_column));
return std::pair(std::move(offsets_column), std::move(chars));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::pair(std::move(offsets_column), std::move(chars));
return std::pair(std::move(offsets_column), chars);

nit: no need to move if it's just a uvector

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the std::move the uvector is copied.
https://godbolt.org/z/K1sTfP1na

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought copy elision could happen automatically with C++17 based on multiple discussions at our cpp channel and also StackOverflow and then by taking a closer look at your sample code, I realized that returning a pair makes a difference (see here).

@davidwendt
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f911ce8 into rapidsai:branch-24.04 Mar 1, 2024
@davidwendt davidwendt deleted the msc-return-uvector branch May 9, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants