Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ growing list of algorithms. The following Python snippet reads input from a CSV
a NearestNeighbors query across a cluster of Dask workers, using multiple GPUs on a single node:


Initialize a `LocalCUDACluster` configured with [UCX](https://github.com/rapidsai/ucx-py) for fast transport of CUDA arrays
Initialize a `LocalCUDACluster` configured with [UCXX](https://github.com/rapidsai/ucxx) for fast transport of CUDA arrays
```python
# Initialize UCX for high-speed transport of CUDA arrays
from dask_cuda import LocalCUDACluster
Expand Down Expand Up @@ -102,8 +102,8 @@ repo](https://github.com/rapidsai/notebooks-contrib).
| **Nonlinear Models for Regression or Classification** | Random Forest (RF) Classification | Experimental multi-node multi-GPU via Dask |
| | Random Forest (RF) Regression | Experimental multi-node multi-GPU via Dask |
| | Inference for decision tree-based models | Forest Inference Library (FIL) |
| | K-Nearest Neighbors (KNN) Classification | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
| | K-Nearest Neighbors (KNN) Regression | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
| | K-Nearest Neighbors (KNN) Classification | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
| | K-Nearest Neighbors (KNN) Regression | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
| | Support Vector Machine Classifier (SVC) | |
| | Epsilon-Support Vector Regression (SVR) | |
| **Preprocessing** | Standardization, or mean removal and variance scaling / Normalization / Encoding categorical features / Discretization / Imputation of missing values / Polynomial features generation / and coming soon custom transformers and non-linear transformation | Based on Scikit-Learn preprocessing
Expand All @@ -114,7 +114,7 @@ repo](https://github.com/rapidsai/notebooks-contrib).
| | SHAP Permutation Explainer
| [Based on SHAP](https://shap.readthedocs.io/en/latest/) |
| **Execution device interoperability** | | Run estimators interchangeably from host/cpu or device/gpu with minimal code change [demo](https://docs.rapids.ai/api/cuml/stable/execution_device_interoperability.html) |
| **Other** | K-Nearest Neighbors (KNN) Search | Multi-node multi-GPU via Dask+[UCX](https://github.com/rapidsai/ucx-py), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |
| **Other** | K-Nearest Neighbors (KNN) Search | Multi-node multi-GPU via Dask+[UCXX](https://github.com/rapidsai/ucxx), uses [Faiss](https://github.com/facebookresearch/faiss) for Nearest Neighbors Query. |

---

Expand Down
7 changes: 2 additions & 5 deletions ci/test_python_dask.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,11 @@ test_args=(
)

# Run tests
rapids-logger "pytest cuml-dask (No UCX-Py/UCXX)"
rapids-logger "pytest cuml-dask (No UCXX)"
timeout 2h ./ci/run_cuml_dask_pytests.sh "${test_args[@]}"

rapids-logger "pytest cuml-dask (UCX-Py only)"
timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucx

rapids-logger "pytest cuml-dask (UCXX only)"
timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucxx
timeout 5m ./ci/run_cuml_dask_pytests.sh "${test_args[@]}" --run_ucx

rapids-logger "Test script exiting with value: $EXITCODE"
exit ${EXITCODE}
2 changes: 1 addition & 1 deletion cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Current cmake offers the following configuration options:
| BUILD_CUML_STD_COMMS | [ON, OFF] | ON | Enable/disable building cuML NCCL+UCX communicator for running multi-node multi-GPU algorithms. Note that UCX support can also be enabled/disabled (see below). The standard communicator and MPI communicator are not mutually exclusive and can both be installed at the same time. |
| WITH_UCX | [ON, OFF] | OFF | Enable/disable UCX support in the standard cuML communicator. Algorithms requiring point-to-point messaging will not work when this is disabled. This flag is ignored if BUILD_CUML_STD_COMMS is set to OFF. |
| BUILD_CUML_MPI_COMMS | [ON, OFF] | OFF | Enable/disable building cuML MPI+NCCL communicator for running multi-node multi-GPU C++ tests. MPI communicator and STD communicator may both be installed at the same time. If OFF, it overrides BUILD_CUML_MG_TESTS to be OFF as well. |
| SINGLEGPU | [ON, OFF] | OFF | Disable all mnmg components. Disables building of all multi-GPU algorithms and all comms library components. Removes libcumlprims, UCX-py and NCCL dependencies. Overrides values of BUILD_CUML_MG_TESTS, BUILD_CUML_STD_COMMS, WITH_UCX and BUILD_CUML_MPI_COMMS. |
| SINGLEGPU | [ON, OFF] | OFF | Disable all mnmg components. Disables building of all multi-GPU algorithms and all comms library components. Removes libcumlprims, UCXX and NCCL dependencies. Overrides values of BUILD_CUML_MG_TESTS, BUILD_CUML_STD_COMMS, WITH_UCX and BUILD_CUML_MPI_COMMS. |
| DISABLE_OPENMP | [ON, OFF] | OFF | Set to `ON` to disable OpenMP |
| CMAKE_CUDA_ARCHITECTURES | List of GPU architectures, semicolon-separated | Empty | List the GPU architectures to compile the GPU targets for. Set to "NATIVE" to auto detect GPU architecture of the system, set to "ALL" to compile for all RAPIDS supported archs: ["60" "62" "70" "72" "75" "80" "86"]. |
| USE_CCACHE | [ON, OFF] | ON | Cache build artifacts with ccache. |
Expand Down
4 changes: 2 additions & 2 deletions python/cuml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ example `setup.py --singlegpu`) are:
| Argument | Behavior |
| --- | --- |
| clean --all | Cleans all Python and Cython artifacts, including pycache folders, .cpp files resulting of cythonization and compiled extensions. |
| --singlegpu | Option to build cuML without multiGPU algorithms. Removes dependency on nccl, libcumlprims and ucx-py. |
| --singlegpu | Option to build cuML without multiGPU algorithms. Removes dependency on nccl, libcumlprims and ucxx. |


### RAFT Integration in cuml.raft
Expand Down Expand Up @@ -66,7 +66,7 @@ To build cuML's Python package, the following dependencies are required:

Packages required for multigpu algorithms*:
- libcumlprims version matching the cuML version
- ucx-py version matching the cuML version
- ucxx version matching the cuML version
- dask-cudf version matching the cuML version
- nccl>=2.5
- rapids-dask-dependency version matching the cuML version
Expand Down
3 changes: 1 addition & 2 deletions python/cuml/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ markers = [
"mg: Multi-GPU tests",
"memleak: Test that checks for memory leaks",
"no_bad_cuml_array_check: Test that should not check for bad CumlArray uses",
"ucx: Run _only_ Dask UCX-Py tests",
"ucxx: Run _only_ Dask UCXX tests",
"ucx: Run _only_ Dask UCXX tests",
]

testpaths = [
Expand Down
46 changes: 3 additions & 43 deletions python/cuml/tests/dask/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,42 +44,22 @@ def client(cluster):
@pytest.fixture(scope="module")
def ucx_cluster():
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster(
protocol="ucx-old",
)
yield cluster
cluster.close()


@pytest.fixture(scope="function")
def ucx_client(ucx_cluster):
from dask.distributed import Client

client = Client(ucx_cluster)
yield client
client.close()


@pytest.fixture(scope="module")
def ucxx_cluster():
from dask_cuda import LocalCUDACluster
from dask_cuda.utils_test import IncreasedCloseTimeoutNanny

cluster = LocalCUDACluster(
protocol="ucxx",
protocol="ucx",
worker_class=IncreasedCloseTimeoutNanny,
)
yield cluster
cluster.close()


@pytest.fixture(scope="function")
def ucxx_client(ucxx_cluster):
def ucx_client(ucx_cluster):
pytest.importorskip("distributed_ucxx")
from dask.distributed import Client

client = Client(ucxx_cluster)
client = Client(ucx_cluster)
yield client
client.close()

Expand All @@ -91,13 +71,6 @@ def pytest_addoption(parser):
"--run_ucx",
action="store_true",
default=False,
help="run _only_ UCX-Py tests",
)

group.addoption(
"--run_ucxx",
action="store_true",
default=False,
help="run _only_ UCXX tests",
)

Expand All @@ -115,16 +88,3 @@ def pytest_collection_modifyitems(config, items):
for item in items:
if "ucx" in item.keywords:
item.add_marker(skip_ucx)

if config.getoption("--run_ucxx"):
skip_others = pytest.mark.skip(
reason="only runs when --run_ucxx is not specified"
)
for item in items:
if "ucxx" not in item.keywords:
item.add_marker(skip_others)
else:
skip_ucxx = pytest.mark.skip(reason="requires --run_ucxx to run")
for item in items:
if "ucxx" in item.keywords:
item.add_marker(skip_ucxx)
65 changes: 0 additions & 65 deletions python/cuml/tests/dask/test_dask_nearest_neighbors.py
Comment thread
csadorf marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -212,47 +212,6 @@ def test_compare_skl_ucx(
)


@pytest.mark.parametrize(
"nrows", [unit_param(300), quality_param(1e6), stress_param(5e8)]
)
@pytest.mark.parametrize("ncols", [10, 30])
@pytest.mark.parametrize(
"nclusters", [unit_param(5), quality_param(10), stress_param(15)]
)
@pytest.mark.parametrize(
"n_neighbors", [unit_param(10), quality_param(4), stress_param(100)]
)
@pytest.mark.parametrize(
"n_parts",
[unit_param(1), unit_param(5), quality_param(7), stress_param(50)],
)
@pytest.mark.parametrize(
"streams_per_handle,reverse_worker_order", [(5, True), (10, False)]
)
@pytest.mark.ucxx
def test_compare_skl_ucxx(
nrows,
ncols,
nclusters,
n_parts,
n_neighbors,
streams_per_handle,
reverse_worker_order,
request,
):
_test_compare_skl(
nrows,
ncols,
nclusters,
n_parts,
n_neighbors,
streams_per_handle,
reverse_worker_order,
"ucxx_client",
request,
)


def _test_batch_size(nrows, ncols, n_parts, batch_size, dask_client, request):
client = request.getfixturevalue(dask_client)

Expand Down Expand Up @@ -307,15 +266,6 @@ def test_batch_size_ucx(nrows, ncols, n_parts, batch_size, request):
_test_batch_size(nrows, ncols, n_parts, batch_size, "ucx_client", request)


@pytest.mark.parametrize("nrows", [unit_param(1000), stress_param(1e5)])
@pytest.mark.parametrize("ncols", [unit_param(10), stress_param(500)])
@pytest.mark.parametrize("n_parts", [unit_param(10), stress_param(100)])
@pytest.mark.parametrize("batch_size", [unit_param(100), stress_param(1e3)])
@pytest.mark.ucxx
def test_batch_size_ucxx(nrows, ncols, n_parts, batch_size, request):
_test_batch_size(nrows, ncols, n_parts, batch_size, "ucxx_client", request)


def _test_return_distance(dask_client, request):
client = request.getfixturevalue(dask_client)

Expand Down Expand Up @@ -357,11 +307,6 @@ def test_return_distance_ucx(request):
_test_return_distance("ucx_client", request)


@pytest.mark.ucxx
def test_return_distance_ucxx(request):
_test_return_distance("ucxx_client", request)


def _test_default_n_neighbors(dask_client, request):
client = request.getfixturevalue(dask_client)

Expand Down Expand Up @@ -408,11 +353,6 @@ def test_default_n_neighbors_ucx(request):
_test_default_n_neighbors("ucx_client", request)


@pytest.mark.ucxx
def test_default_n_neighbors_ucxx(request):
_test_default_n_neighbors("ucxx_client", request)


def _test_one_query_partition(dask_client, request):
client = request.getfixturevalue(dask_client) # noqa

Expand All @@ -435,8 +375,3 @@ def test_one_query_partition(request):
@pytest.mark.ucx
def test_one_query_partition_ucx(request):
_test_one_query_partition("ucx_client", request)


@pytest.mark.ucxx
def test_one_query_partition_ucxx(request):
_test_one_query_partition("ucxx_client", request)
Loading