Hide visibility of non-public symbols#1644
Hide visibility of non-public symbols#1644rapids-bot[bot] merged 16 commits intorapidsai:branch-24.10from jameslamb:symbol-visibility
Conversation
bdice
left a comment
There was a problem hiding this comment.
To the best of my knowledge, this is good! If you removed the RMM_EXPORT from a header, would the rmm Python builds or tests fail?
Good question! I just tried removing the Then tried removing ALL of the Maybe this is because the Python package only calls into the DSOs through public symbols generated by Cython, and those are not affected by this CMake change? I'm going to try pushing a commit with those removals here, let's see what other CI jobs say. |
|
My hope is that we can test the accuracy of our exports within RMM's CI rather than relying on cuDF to tell RMM that it broke. |
This reverts commit e9482bf.
Right, totally understand and I agree. Unfortunately it looks like even removing those, everything passed: https://github.com/rapidsai/rmm/actions/runs/10326031645?pr=1644. So I just pushed 1c39756 reverting this PR back to its original state. Something about CI will have to change here if we want feedback on the exports working correctly within rmm's CI. This is way at the edge of my understanding, so the only thing I can think of is to dump the symbol table for every DSO and pass it through some text processing (e.g. we shouldn't see any WEAK / GLOBAL I'm certain someone else will have a better idea than that for how to express the behavior we want in CI here. cc @robertmaynard @KyleFromNVIDIA |
|
Can you please open an RMM issue for this? Thanks. |
|
As for testing, could add a separate test that links spdlog and RMM and make sure it compiles without symbol conflicts? |
vyasr
left a comment
There was a problem hiding this comment.
There are some issues that we need to address about how symbol visibility works for a header-only library like rmm.
- The C++ changes here don't look right to me. They're matching what we're planning to use in other libraries like libcudf, but there is no librmm DSO. By decorating every rmm C++ public API with
RMM_EXPORT, we have attributes that take priority over the top-level build options specified when compiling any other library that includes rmm headers. IOW, with this PR's application ofRMM_EXPORTspecifying-fvisibility=hiddenin libcudf's build will no longer prevent libcudf from exporting rmm symbols because theRMM_EXPORTattributes will take precedence. I think the only case where rmm should be including an explicit visibility attribute is the preexisting application toget_mapbecause in that case we do explicitly need to indicate that the symbol needs to be uniquified across all DSOs in the process space. The other attributes added in this PR should be removed. - Python extension modules aren't built to be linked to by other libraries, so there's no linking-based check that we can do to ensure that symbols are being hidden in the Python extension modules the way that is requested above. A direct grep of nm/readelf output is the best we can do I think.
- When creating extension modules, the macros used to define the entry points have their own visibility markup that ensures that those symbols are made visible, so we don't need to worry about those.
- Rather than setting global variables in CMakeLists.txt, we should set targets on properties inside
rapids-cmake. You want properties like this one.rapids_cython_create_modulesknows whether a module is being compiled for C or C++, so nthe appropriate property can be set on a per-target basis.
I have a slightly different proposal that'd be in the spirit of this without requiring visibility-specific arguments in the signature of
So without any change changes to Like this... foreach(_cython_target IN LISTS RAPIDS_CYTHON_CREATED_TARGETS)
set_target_properties(
${_cython_target}
PROPERTIES
C_VISIBILITY_PRESET hidden
CXX_VISIBILITY_PRESET hidden
)
endforeach()... after each call to Doing it that way here (and in other similar projects) would allow us to test this behavior without expanding |
Yes. It doesn't show up in the issues for an RMM release. And I don't get a notification showing me the work needed/being done on the library. I choose to follow @jarmak-nv 's suggested process of having an issue for all work done on RMM. It helps me track, plan, and report. |
If they are structs and not objects with vtables you are safe. The RTTI for hidden symbols can have the DSO embedded into it and therefore isn't safe across boundaries as you won't be able to |
|
I just tested the ref: rapidsai/cudf#15483 (comment) I've also proposed a new testing script implementing the idea from discussion above to try to add a CI-time check on symbol visibility. It shows up like this in wheel build logs: Tested that locally and found that it did successfully fail when I omitted the visibility attributes from some targets. I've hard-coded I think this is ready for another review. |
@jameslamb Could you open a rapids-cmake issue to track this feature request? I happy with using the current |
Sure! here you go: rapidsai/rapids-cmake#672
Thanks! I agree. |
When we created build-planning we (including you and @jarmak-nv) specifically discussed this use case as one where the fragmentation associated with creating per-repo issues wasn't worth the cost. Without a centralized point of discussion (e.g. a build-planning issue) we end up with a comment on a cugraph issue pointing to a solution discussed in a combination of posts on a cuml issue and a raft PR (not an exaggeration, this kind of cross-repo discussion is fairly common without a centralized place for discussion). If an rmm issue is a must-have for you, could we compromise on keeping those issues as near-empty issues that just link to a build-planning issue? I realize that creating the issues on other repos isn't hard, especially with rapids-reviser, but that doesn't address the IMO much larger concern of fragmentation. |
I'm fine with doing it this way for now, but I think that in the long run we should upstream the functionality. The purpose of
Hmm yeah I thought briefly about the vtables but didn't get far enough as to consider how they would be affected by crossing ABI boundaries. How is this made safe in general? Are vtables always embedded as unique symbols so that they can be deduplicated across DSOs? What about class methods? If rmm symbols corresponding to its classes are made public, and if one of those classes has some virtual function, could you end up with the same function defined in multiple DSOs (say libcuml and libcuvs) such that the two end up causing runtime lookup conflicts due to the symbols being identical? |
vyasr
left a comment
There was a problem hiding this comment.
Some small suggestions, but nothing blocking. Address as you see fit. I discussed some things with Robert offline and am happy with the current state, recognizing that we may have to make more changes in the future.
|
@harrism could you please review one more time? I significantly changed the approach after your first approval, don't want you to feel like I'm sneaking something in here. |
harrism
left a comment
There was a problem hiding this comment.
Much better. I like that it's not touching the C++ code.
|
Ok thanks for the reviews! I'm going to merge this so we can continue with rapidsai/cudf#15483.
|
|
/merge |
Description
Fixes #1645
Contributes to rapidsai/build-planning#33
Similar to rapidsai/cudf#15982
Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the
rmmPython library, via the following:-fvisibility=hiddenby defaultrmm(everything in thermm::namespace) with__attribute__((visibility("default")))Benefits of this change
Reduces the risk of symbol conflicts when
rmmis used alongside other libraries. For example, see this case incudfwhere thespdlog::symbols inrmmare conflicting with thespdlog::symbols innvcomp: rapidsai/cudf#15483 (comment)Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs.
Notes for Reviewers
This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬
Checklist
How I tested this
Re-used the wheels produced from this PR over in
cudfand sawcudf's tests succeed where previously the had failed because of symbol conflicts: rapidsai/cudf#15483 (comment)Also tested this locally, read on...
On an x86_64 machine with CUDA 12.2, checked out the latest
24.10ofrmmand the branch ofcudffrom rapidsai/cudf#15483 (which is up to date with the latest24.10ofcudf).Built
librmm,rmm,libcudf, andcudfwheels from source, in the same CUDA 12, Python 3.11 image used for wheel building CI.how I did that (click me)
Created a script,
build-all.sh, with the following.Ran it like this:
docker run \ --rm \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ --env PIP_CONSTRAINT=/usr/local/src/build-constraints.txt \ --env WHEEL_DIR=/usr/local/src/dist \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash # build all the wheels ./build-all.shInstalled the wheels produced by that script, using one of the testing images used across RAPIDS CI.
how I did that (click me)
docker run \ --rm \ --gpus 1 \ --env WHEEL_DIR=/usr/local/src/dist \ -v $(pwd):/usr/local/src \ -w /usr/local/src \ -it rapidsai/ci-conda:cuda12.2.2-ubuntu22.04-py3.11 \ bash # install everything pip install ${WHEEL_DIR}/*.whlSaw that fail with exactly the same symbol-conflict issue discussed in rapidsai/cudf#15483 (comment)..
Repeated that process with the changes you see in this PR, and observed:
rmm's Python tests passingcudf😁Also inspected the symbol tables with
readelf. For example, here are the counts of symbols by visibility inmemory_resource.cpython-311-x86_64-linux-gnu.so:It also helped reduce the sizes a bit
how I did that (click me)
Unzipped the
rmmwheel.rm -rf ./delete-me mkdir -p ./delete-me unzip -d ./delete-me ./dist/rmm_cu12*.whlChecked the symbol tables with
readelfreadelf --symbols --wide \ ./delete-me/rmm/_lib/memory_resource.cpython-311-x86_64-linux-gnu.so \ | c++filt \ > syms.txt echo "GLOBAL: $(grep --count -E ' GLOBAL ' < ./syms.txt)" echo "WEAK: $(grep --count -E ' WEAK ' < ./syms.txt)" echo "LOCAL: $(grep --count -E ' LOCAL ' < ./syms.txt)"Checked the sizes with
pydistcheckWhere did
RMM_EXPORTcome from?It's been in
rmmfor a few years:rmm/include/rmm/detail/export.hpp
Lines 19 to 26 in 975c911
But only used in one place, from #833.
References
I found these helpful: