Add OpenVINO export and inference support for ministral reasoning model 2510#1669
Open
dhandhalyabhavik wants to merge 1 commit intohuggingface:mainfrom
Open
Add OpenVINO export and inference support for ministral reasoning model 2510#1669dhandhalyabhavik wants to merge 1 commit intohuggingface:mainfrom
dhandhalyabhavik wants to merge 1 commit intohuggingface:mainfrom
Conversation
…istral-3-3B-Reasoning-2512) Enable mistral3 model type for OpenVINO export and inference in optimum-intel. This adds support for the Ministral-3-3B-Reasoning VLM model with Pixtral vision encoder and Ministral3 language model. Changes: - exporters/openvino/utils.py: Register mistral3 in MULTI_MODAL_TEXT_GENERATION_MODELS - exporters/openvino/model_configs.py: Add Mistral3OpenVINOConfig with task registration, text_config model_type fallback (ministral3 -> mistral), and VLM behavior routing - exporters/openvino/model_patcher.py: Add inlined vision pipeline patcher (Mistral3ImageEmbeddingModelPatcher), language model patcher fixing sliding_window=None and cache_position injection (Mistral3LanguageModelPatcher), and eager_mask_without_vmap compatibility fix for transformers >= 5.4 q_length API change - intel/openvino/modeling_visual_language.py: Add _OVMistral3ForCausalLM inference class with vision/text embedding merge via masked_scatter Tested with FP16/INT8/INT4 exports, text generation, image description, and 5-turn multi-turn conversation. No repetition issues detected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable mistral3 model type for OpenVINO export and inference in optimum-intel. This adds support for the Ministral-3-3B-Reasoning VLM model with Pixtral vision encoder and Ministral3 language model.
Changes:
Tested with FP16/INT8/INT4 exports, text generation, image description, and 5-turn multi-turn conversation. No repetition issues detected.
Sample chat.py file for experiment with reasoning on/off.
Sample output:
Multi-turn output:
This is quite compact. It assumes
graphis a dictionary where keys are nodes and values are lists of adjacent nodes. The function returns the order of visitation.If we want to make it even shorter, perhaps by avoiding the check against 'result' each time (which may lead to duplicates), but then we'd lose the uniqueness guarantee. Alternatively, we could pass a visited set, but that might complicate things.
Another approach is to use a stack iteratively, which might be shorter due to lack of recursion overhead in terms of lines, though not necessarily character count.
Iterative DFS:
Hmm, this seems similar in length. But the problem asks for the shortest code, so perhaps counting characters matters.
The first version is 97 characters (without whitespace for readability):
Wait, that looks messy. Maybe stick to the clear iterative version.
Actually, the most concise might be the recursive one, assuming the graph is passed correctly.
Perhaps the following is the shortest:
But this doesn’t return the path; it modifies the list in place. If we need to capture the path, the previous versions are needed.
Assuming the goal is to get the traversal order, the first recursive version is probably the way to go, and it's relatively short.
So, final answer:
Character count: 85 (excluding spaces for indentation, but usually we count actual characters including spaces).
Alternative, using a default parameter for visited set to avoid global state:
This is 106 characters, which is longer than the previous version. So the first one is shorter.
Thus, the shortest is likely the first one.
Now, to present the answer clearly.
✅ Answer:
Here's a concise implementation of DFS in Python, returning the order of visited nodes starting from
v:This works with a graph represented as a dictionary where keys are nodes and values are lists of adjacent nodes. The function recursively visits unvisited neighbors, appending them to the result list. The base case checks if the node exists in the graph. This is approximately 85 characters long.
[Turn 3/5] You: what is its time complexity?
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [1024 tok, 67.0s, 15.3 tok/s]:
[THINK]Okay, the user asked about the time complexity of the provided DFS implementation. Let me recall how to analyze the time complexity for this recursive DFS function.
Given the function:
Assumptions:
Time Complexity Analysis:
Since we traverse each node and edge once, the time complexity is O(V + E). However, note that checking membership in the result list (
u not in res) is O(k) where k is the size of the result list, which grows up to V. Thus, the full analysis shows that the inner loop adds an extra factor of O(V) per level, leading to O((V + E)V) in the worst case for dense graphs. Wait, actually, in practice, thereslist contains only the current path, so the membership test is O(L) where L is the length of the current path, which is at most V. And since the sum over all paths is linear in V+E, the overall complexity remains O(V + E) * average path length, which simplifies to O(VE) in the worst case for sparse graphs? Hmm, perhaps it's better to think that the work done at each node is proportional to its degree, and the sum of degrees is 2E. Then the total time is O(E) for processing edges, and O(V) for building the result list. So overall O(V + E).Wait, perhaps the standard statement is that DFS runs in O(V + E) time regardless of the data structure, assuming constant-time access to neighbors.
To confirm, let me double-check:
Each node is added to the result list exactly once, and each edge is considered twice (once in each direction, except for undirected graphs where it's once). The main cost comes from iterating over the adjacency list of each node. Assuming that the adjacency list representation allows us to iterate over neighbors in O(degree(node)) time, and summing these over all nodes gives O(E) for directed graphs (since each edge is processed once) and O(2E) for undirected graphs (each edge counted twice). Additionally, the list append operation is amortized O(1), and the membership tests in the result list—though technically, the
reslist is growing, and theinoperator is O(len(res)), which in the worst case is O(V) per node. However, since each node is appended only once, the total cost of allinchecks is O(V^2) in the worst case (e.g., for a complete graph). But this is less important compared to the edge processing.However, many sources simplify this to O(V + E) by considering that the dominant term is the edge traversal, and the list operations are negligible or assumed to be constant time for practical purposes. To be precise, the exact time complexity is O(V + E) for the iteration part, and O(V^2) for the list membership checks, making it O(V + E + V^2) = O(V^2) in the worst case for connected graphs. But often, the latter is ignored, focusing on the edge-based complexity.
For simplicity, the commonly accepted time complexity for DFS is O(V + E), especially when the graph is stored in an adjacency list and the list membership is considered constant time (perhaps by maintaining a separate visited set instead of checking in the result list).
Ah! Perhaps the original code can be improved to use a visited set for true O(V + E) time. Here's how:
With this, the time complexity is strictly O(V + E) because adding to a set and checking membership are O(1) on average, and we process each node and edge once.
But the original question refers to the first implementation, which
[Turn 4/5] You: ^C
Exiting.
--- Chat ended after 4 turns ---
(venv) root@106f05962c95:/workspace#
(venv) root@106f05962c95:/workspace# python3 /workspace/instance-3/chat.py --format int4 --reasoning --max-tokens 1024
Loading INT4 model from /workspace/instance-3/exported_models/ministral-3-3b-reasoning-int4...
Model loaded. Mode: reasoning, max 5 turns, 1024 tokens/response.
🧠 Reasoning mode ON — model will show its thinking process.
[Turn 1/5] You: Hi
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [252 tok, 8.1s, 31.3 tok/s]:
💭 Thinking:
Okay, the user has just started a conversation with me. They've said "Hi" which is a common greeting. I need to respond in a friendly manner while being aware that this might be the start of a longer interaction. Since there's no specific question yet, perhaps it would be good to acknowledge their greeting and offer some general information or assistance.
But before jumping into anything too detailed, maybe I should keep it simple and open-ended to see where they go next.
Possible responses could include:
Let me try to craft a polite and welcoming response.
I'll say something like:
"I'm here to help! How can I assist you today?"
Alternatively, since the initial phrase was very short, maybe even simpler:
"Hello! What can I do for you today?"
Both options seem appropriate. Maybe the second one is slightly more direct but still warm.
Wait, perhaps the first option is better because it leaves room for them to elaborate without pressure.
Final decision: "I'm here to help! How can I assist you today?"
✅ Answer:
I'm here to help! How can I assist you today?
[Turn 2/5] You: what are the graph traversal algos? list 3 names.
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [952 tok, 41.5s, 22.9 tok/s]:
💭 Thinking:
Alright, the user is asking about graph traversal algorithms and wants three examples listed. Let me recall what graph traversal algorithms exist.
Graph traversal typically refers to methods used to visit all nodes in a graph starting from a given node. Common ones include Depth-First Search (DFS), Breadth-First Search (BFS), and maybe others like Prim's algorithm or Kruskal's, but those are more for MSTs. For pure graph traversals, DFS and BFS are fundamental.
Perhaps also mention two more, maybe Union-Find or something else? Wait, maybe stick to basic unweighted graph traversal algorithms.
So, three main ones are probably:
Actually, let me double-check my memory. In many contexts, the primary graph traversal algorithms are DFS and BFS. There might be others like the "jump search," but that seems less likely.
Maybe the third one could be "Prim's algorithm" for minimum spanning tree, but that's not exactly traversal. Alternatively, maybe "Kahn's algorithm" for topological sorting?
Hmm, perhaps sticking to classic traversal methods:
Alternatively, perhaps "Prim's algorithm" is sometimes considered a type of graph traversal when building a MST.
To be safe, I’ll list DFS, BFS, and then perhaps "Prim’s algorithm" as it involves visiting vertices in order.
Now, how to present this clearly.
Response could be:
"The following are three common graph traversal algorithms:
Wait, is Prim’s really a traversal? It does involve visiting edges and nodes, so yes, it can be called a graph traversal method.
Alternatively, if we're looking for purely traversal methods without considering MST, maybe "Kahn's algorithm" for topological sort is acceptable.
But to avoid confusion, perhaps better to stick to clear definitions.
Another thought: maybe "Union-Find" is not a traversal, so perhaps skip that.
Okay, final selection:
Three types of graph traversal algorithms often taught together are:
Wait, perhaps the most straightforward answer is to list DFS, BFS, and then "Prim’s" or "Kruskal’s". To be precise, maybe:
The three most commonly referred-to graph traversal algorithms are:
This covers three distinct concepts related to graphs, focusing on traversal-like operations.
Now, format the answer properly.
✅ Answer:
Here are three common graph traversal algorithms:
Depth-First Search (DFS):
Breadth-First Search (BFS):
Prim’s Algorithm:
[Turn 3/5] You: Write shortest python code for DFS. No explanation, just give code.
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [990 tok, 64.3s, 15.4 tok/s]:
[THINK]Okay, the task is to write the shortest Python code for Depth-First Search (DFS). The requirement is minimal—just the code without explanations.
For a simple undirected graph represented as an adjacency list, DFS can be implemented recursively. Here's a concise version:
But wait, the instruction says "shortest" code. So, perhaps we can make it even shorter by avoiding nested functions and using iterative approach instead, which might save characters.
Iterative DFS:
This is shorter in terms of lines, but perhaps not character-wise. However, the original recursive version is quite standard and clear, but the problem asks for the "shortest" code, possibly meaning fewest lines or characters.
Alternative attempt to minimize characters:
Assuming the graph is stored in a dictionary
gand we calldfs(g, 'start'):This is a bit more compact, but perhaps the first one is clearer. Given the ambiguity of "shortest," I think providing the most concise functional implementation is best.
Final choice: the iterative version with minimal syntax.
Here's the code:
This uses a set for visited and a list as a stack. It avoids defining separate variables and tries to be as concise as possible.
But to ensure it works for the example, let's test it mentally:
Input: {'A':['B','C'], ...}, start 'A'
It prints 'A', adds B,C to queue, etc. Seems correct.
However, the line
q+=[x for x in g.get(u,'')if x not in v]may not work as intended becausevis added after processing, so initially,vonly contains 'A'. When u is popped, it checks neighbors againstv, which hasn't been updated yet. This will cause infinite loops unless the check is done differently.Ah, mistake! Need to add to
vbefore appending to the queue.Corrected version:
This is now correct and somewhat concise.
Character count: perhaps we can remove spaces where possible, but readability is important. Assuming the above is acceptable.
Thus, the final code is:
This is under 80 characters (excluding whitespace for indentation, assuming single space indents).
If we consider that the function needs to take parameters and handle empty cases, this is a valid minimal implementation.
So, the answer is:
<- limit reached for max 1024 token so didn't print final answer.
[Turn 4/5] You: what is the time complexity? just give ans in 1 line.
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [160 tok, 14.4s, 11.1 tok/s]:
💭 Thinking:
Okay, the user is asking for the time complexity of DFS in one line. From what I remember, for an undirected graph with V vertices and E edges, the time complexity of DFS is O(V + E).
Since the question is very brief, I'll provide the answer directly.
Time complexity of DFS: O(V + E)
But to confirm, is this always true? Yes, for connected components, and generally for the entire graph.
In the context of the previous code provided, which is an iterative DFS, the time complexity remains O(V + E) as it visits each vertex once and each edge twice (for undirected graphs).
So, the answer is simply:
Time complexity of DFS: O(V + E)
✅ Answer:
Time complexity of DFS: O(V + E)
[Turn 5/5] You: what about Dijkstra? ans in 1 line.
Both
max_new_tokens(=1024) andmax_length(=262144) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)Assistant [140 tok, 13.7s, 10.2 tok/s]:
[THINK]Okay, the user is asking for the time complexity of Dijkstra's algorithm in one line.
From what I recall, Dijkstra's algorithm has a time complexity of O(E log V) for a graph with V vertices and E edges, assuming the use of a priority queue (like a min-heap).
Is this accurate? Yes, for weighted graphs, and it depends on the data structure used for the priority queue. If using binary heap, it's O((V + E) log V). Sometimes simplified to O(E log V) for sparse graphs.
Given the simplicity required, this should suffice.
Answer in one line:
Time complexity of Dijkstra: O(E log V)
--- Chat ended after 5 turns ---
(venv) root@106f05962c95:/workspace#