Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 60 additions & 2 deletions datafusion/execution/src/memory_pool/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ pub use pool::*;
/// `GroupByHashExec`. It does NOT track and limit memory used internally by
/// other operators such as `DataSourceExec` or the `RecordBatch`es that flow
/// between operators. Furthermore, operators should not reserve memory for the
/// batches they produce. Instead, if a parent operator needs to hold batches
/// from its children in memory for an extended period, it is the parent
/// batches they produce. Instead, if a consumer operator needs to hold batches
/// from its producers in memory for an extended period, it is the consumer
/// operator's responsibility to reserve the necessary memory for those batches.
///
/// In order to avoid allocating memory until the OS or the container system
Expand Down Expand Up @@ -98,6 +98,64 @@ pub use pool::*;
/// operator will spill the intermediate buffers to disk, and release memory
/// from the memory pool, and continue to retry memory reservation.
///
/// # Related Structs
///
/// To better understand memory management in DataFusion, here are the key structs
/// and their relationships:
///
/// - [`MemoryConsumer`]: A named allocation traced by a particular operator. If an
/// execution is parallelized, and there are multiple partitions of the same
/// operator, each partition will have a separate `MemoryConsumer`.
/// - `SharedRegistration`: A registration of a `MemoryConsumer` with a `MemoryPool`.
/// `SharedRegistration` and `MemoryPool` have a many-to-one relationship. `MemoryPool`
/// implementation can decide how to allocate memory based on the registered consumers.
/// (e.g. `FairSpillPool` will try to share available memory evenly among all registered
/// consumers)
/// - [`MemoryReservation`]: Each `MemoryConsumer`/operator can have multiple
/// `MemoryReservation`s for different internal data structures. The relationship
/// between `MemoryConsumer` and `MemoryReservation` is one-to-many. This design
/// enables cleaner operator implementations:
/// - Different `MemoryReservation`s can be used for different purposes
/// - `MemoryReservation` follows RAII principles - to release a reservation,
/// simply drop the corresponding `MemoryReservation` object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is good for explaining that MemoryReservation uses RAII-style dropping, but it might be clearer if we also mention that it is automatically unregistered from the memory pool via SharedRegistration. That way, it’s easier to understand the role of SharedRegistration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

///
/// ## Relationship Diagram
///
/// ```text
/// ┌──────────────────┐ ┌──────────────────┐
/// │MemoryReservation │ │MemoryReservation │
/// └───┬──────────────┘ └──────────────────┘ ......
/// │belongs to │
/// │ ┌───────────────────────┘ │ │
/// │ │ │ │
/// ▼ ▼ ▼ ▼
/// ┌────────────────────────┐ ┌────────────────────────┐
/// │ SharedRegistration │ │ SharedRegistration │
/// │ ┌────────────────┐ │ │ ┌────────────────┐ │
/// │ │ │ │ │ │ │ │
/// │ │ MemoryConsumer │ │ │ │ MemoryConsumer │ │
/// │ │ │ │ │ │ │ │
/// │ └────────────────┘ │ │ └────────────────┘ │
/// └────────────┬───────────┘ └────────────┬───────────┘
/// │ │
/// │ register│into
/// │ │
/// └─────────────┐ ┌──────────────┘
/// │ │
/// ▼ ▼
/// ╔═══════════════════════════════════════════════════╗
/// ║ ║
/// ║ MemoryPool ║
/// ║ ║
/// ╚═══════════════════════════════════════════════════╝
/// ```
///
/// For example, there are two parallel partitions of an operator X: each partition
/// corresponds to a `MemoryConsumer` in the above diagram. Inside each partition of
/// operator X, there are typically several `MemoryReservation`s - one for each
/// internal data structure that needs memory tracking (e.g., 1 reservation for the hash
/// table, and 1 reservation for buffered input, etc.).
///
/// # Implementing `MemoryPool`
///
/// You can implement a custom allocation policy by implementing the
Expand Down