DirectML: Missing required session options (memory_pattern, execution_mode) causes OOM on Windows

Hey! This isn't my usual wheelhouse, so apologies if I'm off base.... But I think I've found a missing configuration that's causing GPU memory failures on Windows with DirectML.

I'm building a desktop app in Rust that uses fastembed for text and image embeddings (NomicEmbedTextV15Q / NomicEmbedVisionV15). After upgrading fastembed, I started getting this consistently:

```
Text embedding failed: Non-zero status code returned while running Mul node.
Name:'/encoder/layers.0/mlp/Mul_1'
Status Message: Not enough memory resources are available to complete this operation.
```

- Windows 11, RTX 4070 Ti Super (16GB VRAM)
- ort 2.0.0-rc.11 with directml feature
- fastembed 5.13.1 (candle 0.10.2) - works fine on 5.13.0 (candle 0.9.1)
- ONNX models via DirectML execution provider

What I think is happening:

The ONNX Runtime [DirectML docs](https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html) state:

> "The DirectML execution provider does not support the use of memory pattern optimizations or parallel execution in onnxruntime. Specifically, execution_mode must be set to ExecutionMode::ORT_SEQUENTIAL, and enable_mem_pattern must be false."

Looking at fastembed's session builder (in both `text_embedding/impl.rs` and `image_embedding/impl.rs`), the session is created like this:


```
Session::builder()?
    .with_execution_providers(execution_providers)?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(threads)?
    .commit_from_file(model_file_reference)?

```

Neither `.with_memory_pattern(false)` nor `.with_parallel_execution(false)` is set. Both are available on ort's SessionBuilder.

This likely worked under candle 0.9 because its graph happened to be forgiving of the misconfiguration, but candle 0.10's changes to the Mul operation's memory allocation seem to expose it.

Suggested fix:

ort's ExecutionProviderDispatch has a `downcast_ref `method, so fastembed could detect DirectML and apply the required settings:


```
let has_directml = execution_providers.iter().any(|ep| {
    ep.downcast_ref::<ort::ep::DirectML>().is_some()
});

let mut builder = Session::builder()?
    .with_execution_providers(execution_providers)?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(threads)?;

if has_directml {
    builder = builder
        .with_memory_pattern(false)?
        .with_parallel_execution(false)?;
}

builder.commit_from_file(model_file_reference)?
```

The DirectML type is behind a feature gate in ort, so the detection would need a `#[cfg()]` guard to compile on non-Windows platforms.

This would need to be applied in 4 places: `try_new` and `try_new_from_user_defined` in both `text_embedding/impl.rs` and `image_embedding/impl.rs`.

Again, this is a little out of my wheel house, so i might be heading in the wrong direction. 
Happy to submit a PR if this seems right to get the ball rolling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DirectML: Missing required session options (memory_pattern, execution_mode) causes OOM on Windows #245

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DirectML: Missing required session options (memory_pattern, execution_mode) causes OOM on Windows #245

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions