Skip to content

DirectML: Missing required session options (memory_pattern, execution_mode) causes OOM on Windows #245

@markgandolfo

Description

@markgandolfo

Hey! This isn't my usual wheelhouse, so apologies if I'm off base.... But I think I've found a missing configuration that's causing GPU memory failures on Windows with DirectML.

I'm building a desktop app in Rust that uses fastembed for text and image embeddings (NomicEmbedTextV15Q / NomicEmbedVisionV15). After upgrading fastembed, I started getting this consistently:

Text embedding failed: Non-zero status code returned while running Mul node.
Name:'/encoder/layers.0/mlp/Mul_1'
Status Message: Not enough memory resources are available to complete this operation.
  • Windows 11, RTX 4070 Ti Super (16GB VRAM)
  • ort 2.0.0-rc.11 with directml feature
  • fastembed 5.13.1 (candle 0.10.2) - works fine on 5.13.0 (candle 0.9.1)
  • ONNX models via DirectML execution provider

What I think is happening:

The ONNX Runtime DirectML docs state:

"The DirectML execution provider does not support the use of memory pattern optimizations or parallel execution in onnxruntime. Specifically, execution_mode must be set to ExecutionMode::ORT_SEQUENTIAL, and enable_mem_pattern must be false."

Looking at fastembed's session builder (in both text_embedding/impl.rs and image_embedding/impl.rs), the session is created like this:

Session::builder()?
    .with_execution_providers(execution_providers)?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(threads)?
    .commit_from_file(model_file_reference)?

Neither .with_memory_pattern(false) nor .with_parallel_execution(false) is set. Both are available on ort's SessionBuilder.

This likely worked under candle 0.9 because its graph happened to be forgiving of the misconfiguration, but candle 0.10's changes to the Mul operation's memory allocation seem to expose it.

Suggested fix:

ort's ExecutionProviderDispatch has a downcast_ref method, so fastembed could detect DirectML and apply the required settings:

let has_directml = execution_providers.iter().any(|ep| {
    ep.downcast_ref::<ort::ep::DirectML>().is_some()
});

let mut builder = Session::builder()?
    .with_execution_providers(execution_providers)?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(threads)?;

if has_directml {
    builder = builder
        .with_memory_pattern(false)?
        .with_parallel_execution(false)?;
}

builder.commit_from_file(model_file_reference)?

The DirectML type is behind a feature gate in ort, so the detection would need a #[cfg()] guard to compile on non-Windows platforms.

This would need to be applied in 4 places: try_new and try_new_from_user_defined in both text_embedding/impl.rs and image_embedding/impl.rs.

Again, this is a little out of my wheel house, so i might be heading in the wrong direction.
Happy to submit a PR if this seems right to get the ball rolling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions