rapidsai · rapids-bot · Aug 26, 2025 · Aug 25, 2025 · Aug 25, 2025 · Aug 25, 2025
diff --git a/README.md b/README.md
@@ -42,7 +42,7 @@ growing list of algorithms. The following Python snippet reads input from a CSV
 a NearestNeighbors query across a cluster of Dask workers, using multiple GPUs on a single node:
 
 
-Initialize a `LocalCUDACluster` configured with [UCX](https://github.com/rapidsai/ucx-py) for fast transport of CUDA arrays
+Initialize a `LocalCUDACluster` configured with [UCXX](https://github.com/rapidsai/ucxx) for fast transport of CUDA arrays
 ```python
 # Initialize UCX for high-speed transport of CUDA arrays
 from dask_cuda import LocalCUDACluster
@@ -102,8 +102,8 @@ repo](https://github.com/rapidsai/notebooks-contrib).
 | **Nonlinear Models for Regression or Classification** | Random Forest (RF) Classification | Experimental multi-node multi-GPU via Dask |
 | | Random Forest (RF) Regression | Experimental multi-node multi-GPU via Dask |
 | | Inference for decision tree-based models | Forest Inference Library (FIL) |
-|  | K-Nearest Neighbors (KNN) Classification | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
-|  | K-Nearest Neighbors (KNN) Regression | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
+|  | K-Nearest Neighbors (KNN) Classification | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
+|  | K-Nearest Neighbors (KNN) Regression | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
 |  | Support Vector Machine Classifier (SVC) | |
 |  | Epsilon-Support Vector Regression (SVR) | |
 | **Preprocessing** | Standardization, or mean removal and variance scaling / Normalization / Encoding categorical features / Discretization / Imputation of missing values / Polynomial features generation / and coming soon custom transformers and non-linear transformation | Based on Scikit-Learn preprocessing
@@ -114,7 +114,7 @@ repo](https://github.com/rapidsai/notebooks-contrib).
 |                                                       | SHAP Permutation Explainer
 | [Based on SHAP](https://shap.readthedocs.io/en/latest/)                                                                                                                                               |
 | **Execution device interoperability** | | Run estimators interchangeably from host/cpu or device/gpu with minimal code change [demo](https://docs.rapids.ai/api/cuml/stable/execution_device_interoperability.html) |
-| **Other**                                             | K-Nearest Neighbors (KNN) Search                                                                                                          | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
+| **Other**                                             | K-Nearest Neighbors (KNN) Search                                                                                                          | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
 
 ---
 

@@ -19,14 +19,11 @@ test_args=(
 )
 
 # Run tests
-rapids-logger "pytest cuml-dask (No UCX-Py/UCXX)"
+rapids-logger "pytest cuml-dask (No UCXX)"
 timeout 2h ./ci/run_cuml_dask_pytests.sh "${test_args[@]}"
 
-rapids-logger "pytest cuml-dask (UCX-Py only)"
-timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucx
-
 rapids-logger "pytest cuml-dask (UCXX only)"
-timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucxx
+timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucx
 
 rapids-logger "Test script exiting with value: $EXITCODE"
 exit ${EXITCODE}
@@ -40,7 +40,7 @@ Current cmake offers the following configuration options:
 | BUILD_CUML_STD_COMMS | [ON, OFF] | ON | Enable/disable building cuML NCCL+UCX communicator for running multi-node multi-GPU algorithms. Note that UCX support can also be enabled/disabled (see below). The standard communicator and MPI communicator are not mutually exclusive and can both be installed at the same time. |
 | WITH_UCX | [ON, OFF] | OFF | Enable/disable UCX support in the standard cuML communicator. Algorithms requiring point-to-point messaging will not work when this is disabled. This flag is ignored if BUILD_CUML_STD_COMMS is set to OFF. |
 | BUILD_CUML_MPI_COMMS | [ON, OFF] | OFF | Enable/disable building cuML MPI+NCCL communicator for running multi-node multi-GPU C++ tests. MPI communicator and STD communicator may both be installed at the same time. If OFF, it overrides BUILD_CUML_MG_TESTS to be OFF as well. |
-| SINGLEGPU | [ON, OFF] | OFF | Disable all mnmg components. Disables building of all multi-GPU algorithms and all comms library components. Removes libcumlprims, UCX-py and NCCL dependencies. Overrides values of  BUILD_CUML_MG_TESTS, BUILD_CUML_STD_COMMS, WITH_UCX and BUILD_CUML_MPI_COMMS. |
+| SINGLEGPU | [ON, OFF] | OFF | Disable all mnmg components. Disables building of all multi-GPU algorithms and all comms library components. Removes libcumlprims, UCXX and NCCL dependencies. Overrides values of  BUILD_CUML_MG_TESTS, BUILD_CUML_STD_COMMS, WITH_UCX and BUILD_CUML_MPI_COMMS. |
 | DISABLE_OPENMP | [ON, OFF]  | OFF  | Set to `ON` to disable OpenMP  |
 | CMAKE_CUDA_ARCHITECTURES |  List of GPU architectures, semicolon-separated | Empty  | List the GPU architectures to compile the GPU targets for. Set to "NATIVE" to auto detect GPU architecture of the system, set to "ALL" to compile for all RAPIDS supported archs: ["60" "62" "70" "72" "75" "80" "86"].  |
 | USE_CCACHE | [ON, OFF]  | ON  | Cache build artifacts with ccache. |

@@ -31,7 +31,7 @@ example `setup.py --singlegpu`) are:
 | Argument | Behavior |
 | --- | --- |
 | clean --all | Cleans all Python and Cython artifacts, including pycache folders, .cpp files resulting of cythonization and compiled extensions. |
-| --singlegpu | Option to build cuML without multiGPU algorithms. Removes dependency on nccl, libcumlprims and ucx-py. |
+| --singlegpu | Option to build cuML without multiGPU algorithms. Removes dependency on nccl, libcumlprims and ucxx. |
 
 
 ### RAFT Integration in cuml.raft
@@ -66,7 +66,7 @@ To build cuML's Python package, the following dependencies are required:
 
 Packages required for multigpu algorithms*:
 - libcumlprims version matching the cuML version
-- ucx-py version matching the cuML version
+- ucxx version matching the cuML version
 - dask-cudf version matching the cuML version
 - nccl>=2.5
 - rapids-dask-dependency version matching the cuML version

@@ -37,8 +37,7 @@ markers = [
   "mg: Multi-GPU tests",
   "memleak: Test that checks for memory leaks",
   "no_bad_cuml_array_check: Test that should not check for bad CumlArray uses",
-  "ucx: Run _only_ Dask UCX-Py tests",
-  "ucxx: Run _only_ Dask UCXX tests",
+  "ucx: Run _only_ Dask UCXX tests",
 ]
 
 testpaths = [

@@ -44,42 +44,22 @@ def client(cluster):
 @pytest.fixture(scope="module")
 def ucx_cluster():
     from dask_cuda import LocalCUDACluster
-
-    cluster = LocalCUDACluster(
-        protocol="ucx-old",
-    )
-    yield cluster
-    cluster.close()
-
-
-@pytest.fixture(scope="function")
-def ucx_client(ucx_cluster):
-    from dask.distributed import Client
-
-    client = Client(ucx_cluster)
-    yield client
-    client.close()
-
-
-@pytest.fixture(scope="module")
-def ucxx_cluster():
-    from dask_cuda import LocalCUDACluster
     from dask_cuda.utils_test import IncreasedCloseTimeoutNanny
 
     cluster = LocalCUDACluster(
-        protocol="ucxx",
+        protocol="ucx",
         worker_class=IncreasedCloseTimeoutNanny,
     )
     yield cluster
     cluster.close()
 
 
 @pytest.fixture(scope="function")
-def ucxx_client(ucxx_cluster):
+def ucx_client(ucx_cluster):
     pytest.importorskip("distributed_ucxx")
     from dask.distributed import Client
 
-    client = Client(ucxx_cluster)
+    client = Client(ucx_cluster)
     yield client
     client.close()
 
@@ -91,13 +71,6 @@ def pytest_addoption(parser):
         "--run_ucx",
         action="store_true",
         default=False,
-        help="run _only_ UCX-Py tests",
-    )
-
-    group.addoption(
-        "--run_ucxx",
-        action="store_true",
-        default=False,
         help="run _only_ UCXX tests",
     )
 
@@ -115,16 +88,3 @@ def pytest_collection_modifyitems(config, items):
         for item in items:
             if "ucx" in item.keywords:
                 item.add_marker(skip_ucx)
-
-    if config.getoption("--run_ucxx"):
-        skip_others = pytest.mark.skip(
-            reason="only runs when --run_ucxx is not specified"
-        )
-        for item in items:
-            if "ucxx" not in item.keywords:
-                item.add_marker(skip_others)
-    else:
-        skip_ucxx = pytest.mark.skip(reason="requires --run_ucxx to run")
-        for item in items:
-            if "ucxx" in item.keywords:
-                item.add_marker(skip_ucxx)
@@ -212,47 +212,6 @@ def test_compare_skl_ucx(
     )
 
 
-@pytest.mark.parametrize(
-    "nrows", [unit_param(300), quality_param(1e6), stress_param(5e8)]
-)
-@pytest.mark.parametrize("ncols", [10, 30])
-@pytest.mark.parametrize(
-    "nclusters", [unit_param(5), quality_param(10), stress_param(15)]
-)
-@pytest.mark.parametrize(
-    "n_neighbors", [unit_param(10), quality_param(4), stress_param(100)]
-)
-@pytest.mark.parametrize(
-    "n_parts",
-    [unit_param(1), unit_param(5), quality_param(7), stress_param(50)],
-)
-@pytest.mark.parametrize(
-    "streams_per_handle,reverse_worker_order", [(5, True), (10, False)]
-)
-@pytest.mark.ucxx
-def test_compare_skl_ucxx(
-    nrows,
-    ncols,
-    nclusters,
-    n_parts,
-    n_neighbors,
-    streams_per_handle,
-    reverse_worker_order,
-    request,
-):
-    _test_compare_skl(
-        nrows,
-        ncols,
-        nclusters,
-        n_parts,
-        n_neighbors,
-        streams_per_handle,
-        reverse_worker_order,
-        "ucxx_client",
-        request,
-    )
-
-
 def _test_batch_size(nrows, ncols, n_parts, batch_size, dask_client, request):
     client = request.getfixturevalue(dask_client)
 
@@ -307,15 +266,6 @@ def test_batch_size_ucx(nrows, ncols, n_parts, batch_size, request):
     _test_batch_size(nrows, ncols, n_parts, batch_size, "ucx_client", request)
 
 
-@pytest.mark.parametrize("nrows", [unit_param(1000), stress_param(1e5)])
-@pytest.mark.parametrize("ncols", [unit_param(10), stress_param(500)])
-@pytest.mark.parametrize("n_parts", [unit_param(10), stress_param(100)])
-@pytest.mark.parametrize("batch_size", [unit_param(100), stress_param(1e3)])
-@pytest.mark.ucxx
-def test_batch_size_ucxx(nrows, ncols, n_parts, batch_size, request):
-    _test_batch_size(nrows, ncols, n_parts, batch_size, "ucxx_client", request)
-
-
 def _test_return_distance(dask_client, request):
     client = request.getfixturevalue(dask_client)
 
@@ -357,11 +307,6 @@ def test_return_distance_ucx(request):
     _test_return_distance("ucx_client", request)
 
 
-@pytest.mark.ucxx
-def test_return_distance_ucxx(request):
-    _test_return_distance("ucxx_client", request)
-
-
 def _test_default_n_neighbors(dask_client, request):
     client = request.getfixturevalue(dask_client)
 
@@ -408,11 +353,6 @@ def test_default_n_neighbors_ucx(request):
     _test_default_n_neighbors("ucx_client", request)
 
 
-@pytest.mark.ucxx
-def test_default_n_neighbors_ucxx(request):
-    _test_default_n_neighbors("ucxx_client", request)
-
-
 def _test_one_query_partition(dask_client, request):
     client = request.getfixturevalue(dask_client)  # noqa
 
@@ -435,8 +375,3 @@ def test_one_query_partition(request):
 @pytest.mark.ucx
 def test_one_query_partition_ucx(request):
     _test_one_query_partition("ucx_client", request)
-
-
-@pytest.mark.ucxx
-def test_one_query_partition_ucxx(request):
-    _test_one_query_partition("ucxx_client", request)