diff --git a/datafusion/core/src/lib.rs b/datafusion/core/src/lib.rs index 989799b1f874..096f18ceb916 100644 --- a/datafusion/core/src/lib.rs +++ b/datafusion/core/src/lib.rs @@ -498,10 +498,21 @@ //! While preparing for execution, DataFusion tries to create this many distinct //! `async` [`Stream`]s for each [`ExecutionPlan`]. //! The [`Stream`]s for certain [`ExecutionPlan`]s, such as [`RepartitionExec`] -//! and [`CoalescePartitionsExec`], spawn [Tokio] [`task`]s, that are run by +//! and [`CoalescePartitionsExec`], spawn [Tokio] [`task`]s, that run on //! threads managed by the [`Runtime`]. //! Many DataFusion [`Stream`]s perform CPU intensive processing. //! +//! ### Cooperative Scheduling +//! +//! DataFusion uses cooperative scheduling, which means that each [`Stream`] +//! is responsible for yielding control back to the [`Runtime`] after +//! some amount of work is done. Please see the [`coop`] module documentation +//! for more details. +//! +//! [`coop`]: datafusion_physical_plan::coop +//! +//! ### Network I/O and CPU intensive tasks +//! //! Using `async` for CPU intensive tasks makes it easy for [`TableProvider`]s //! to perform network I/O using standard Rust `async` during execution. //! However, this design also makes it very easy to mix CPU intensive and latency diff --git a/datafusion/physical-plan/src/coop.rs b/datafusion/physical-plan/src/coop.rs index d55c7b8c97af..be0afa07eac2 100644 --- a/datafusion/physical-plan/src/coop.rs +++ b/datafusion/physical-plan/src/coop.rs @@ -19,7 +19,7 @@ //! //! # Cooperative scheduling //! -//! A single call to `poll_next` on a top-level `Stream` may potentially perform a lot of work +//! A single call to `poll_next` on a top-level [`Stream`] may potentially perform a lot of work //! before it returns a `Poll::Pending`. Think for instance of calculating an aggregation over a //! large dataset. //! If a `Stream` runs for a long period of time without yielding back to the Tokio executor,