Skip to content

Improve documentation on asynchronous execution #6859

@csadorf

Description

@csadorf

The Python developer docs contain a section on async operations that incorrectly suggests concurrent execution of estimators. Testing shows that operations like kmeans1.fit(X1) and kmeans2.fit(X2) run sequentially, not concurrently.

We should remove or revise this section to accurately reflect the actual behavior of cuML operations.

Here is the relevant (slightly adjusted) example:

import cuml
import numpy as np
from pylibraft.common import DeviceResources, Stream

# Create two separate streams and handles
s1 = Stream()
h1 = DeviceResources(stream=s1)

s2 = Stream()
h2 = DeviceResources(stream=s2)

# Create sample data
X1 = np.array([[1, 2], [2, 3]], dtype=np.float32)
X2 = np.array([[4, 5], [6, 7]], dtype=np.float32)

# Create and run algorithms concurrently
kmeans1 = cuml.KMeans(handle=h1, n_clusters=2)
kmeans2 = cuml.KMeans(handle=h2, n_clusters=2)

# These operations will run concurrently
kmeans1.fit(X1)
kmeans2.fit(X2)

# Wait for both operations to complete
s1.sync()
s2.sync()

# Access results
print(kmeans1.cluster_centers_)
print(kmeans2.cluster_centers_)
Original example
import cuml
from cuml.cuda import Stream
s1 = Stream()
h1 = cuml.Handle()
h1.setStream(s1)
s2 = Stream()
h2 = cuml.Handle()
h2.setStream(s2)
algo1 = cuml.Algo1(handle=h1, ...)
algo2 = cuml.Algo2(handle=h2, ...)
algo1.fit(X1, y1)
algo2.fit(X2, y2)

Relevant discussion

Metadata

Metadata

Assignees

Labels

docDocumentation

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions