Hey! This isn't my usual wheelhouse, so apologies if I'm off base.... But I think I've found a missing configuration that's causing GPU memory failures on Windows with DirectML.
I'm building a desktop app in Rust that uses fastembed for text and image embeddings (NomicEmbedTextV15Q / NomicEmbedVisionV15). After upgrading fastembed, I started getting this consistently:
Text embedding failed: Non-zero status code returned while running Mul node.
Name:'/encoder/layers.0/mlp/Mul_1'
Status Message: Not enough memory resources are available to complete this operation.
- Windows 11, RTX 4070 Ti Super (16GB VRAM)
- ort 2.0.0-rc.11 with directml feature
- fastembed 5.13.1 (candle 0.10.2) - works fine on 5.13.0 (candle 0.9.1)
- ONNX models via DirectML execution provider
What I think is happening:
The ONNX Runtime DirectML docs state:
"The DirectML execution provider does not support the use of memory pattern optimizations or parallel execution in onnxruntime. Specifically, execution_mode must be set to ExecutionMode::ORT_SEQUENTIAL, and enable_mem_pattern must be false."
Looking at fastembed's session builder (in both text_embedding/impl.rs and image_embedding/impl.rs), the session is created like this:
Session::builder()?
.with_execution_providers(execution_providers)?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(threads)?
.commit_from_file(model_file_reference)?
Neither .with_memory_pattern(false) nor .with_parallel_execution(false) is set. Both are available on ort's SessionBuilder.
This likely worked under candle 0.9 because its graph happened to be forgiving of the misconfiguration, but candle 0.10's changes to the Mul operation's memory allocation seem to expose it.
Suggested fix:
ort's ExecutionProviderDispatch has a downcast_ref method, so fastembed could detect DirectML and apply the required settings:
let has_directml = execution_providers.iter().any(|ep| {
ep.downcast_ref::<ort::ep::DirectML>().is_some()
});
let mut builder = Session::builder()?
.with_execution_providers(execution_providers)?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(threads)?;
if has_directml {
builder = builder
.with_memory_pattern(false)?
.with_parallel_execution(false)?;
}
builder.commit_from_file(model_file_reference)?
The DirectML type is behind a feature gate in ort, so the detection would need a #[cfg()] guard to compile on non-Windows platforms.
This would need to be applied in 4 places: try_new and try_new_from_user_defined in both text_embedding/impl.rs and image_embedding/impl.rs.
Again, this is a little out of my wheel house, so i might be heading in the wrong direction.
Happy to submit a PR if this seems right to get the ball rolling.
Hey! This isn't my usual wheelhouse, so apologies if I'm off base.... But I think I've found a missing configuration that's causing GPU memory failures on Windows with DirectML.
I'm building a desktop app in Rust that uses fastembed for text and image embeddings (NomicEmbedTextV15Q / NomicEmbedVisionV15). After upgrading fastembed, I started getting this consistently:
What I think is happening:
The ONNX Runtime DirectML docs state:
Looking at fastembed's session builder (in both
text_embedding/impl.rsandimage_embedding/impl.rs), the session is created like this:Neither
.with_memory_pattern(false)nor.with_parallel_execution(false)is set. Both are available on ort's SessionBuilder.This likely worked under candle 0.9 because its graph happened to be forgiving of the misconfiguration, but candle 0.10's changes to the Mul operation's memory allocation seem to expose it.
Suggested fix:
ort's ExecutionProviderDispatch has a
downcast_refmethod, so fastembed could detect DirectML and apply the required settings:The DirectML type is behind a feature gate in ort, so the detection would need a
#[cfg()]guard to compile on non-Windows platforms.This would need to be applied in 4 places:
try_newandtry_new_from_user_definedin bothtext_embedding/impl.rsandimage_embedding/impl.rs.Again, this is a little out of my wheel house, so i might be heading in the wrong direction.
Happy to submit a PR if this seems right to get the ball rolling.