Skip to content

Commit aefe3f0

Browse files
mudlergithub-actions[bot]
authored andcommitted
chore(model gallery): 🤖 add new models via gallery agent
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 3a40b41 commit aefe3f0

File tree

1 file changed

+38
-0
lines changed

1 file changed

+38
-0
lines changed

gallery/index.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23023,3 +23023,41 @@
2302323023
- filename: Evilmind-24B-v1.i1-Q4_K_M.gguf
2302423024
sha256: 22e56c86b4f4a8f7eb3269f72a6bb0f06a7257ff733e21063fdec6691a52177d
2302523025
uri: huggingface://mradermacher/Evilmind-24B-v1-i1-GGUF/Evilmind-24B-v1.i1-Q4_K_M.gguf
23026+
- !!merge <<: *llava
23027+
name: "gelato-30b-a3b-i1"
23028+
urls:
23029+
- https://huggingface.co/mradermacher/Gelato-30B-A3B-i1-GGUF
23030+
description: |
23031+
### 🍨 Gelato-30B-A3B – A State-of-the-Art Vision-Language Model for GUI Grounding
23032+
23033+
**Overview**
23034+
Gelato-30B-A3B is a high-performance, open-source vision-language model (VLM) specifically designed for computer-use agent tasks. Trained on the large-scale **Click-100k** dataset, it excels at locating UI elements in graphical user interfaces (GUIs), making it ideal for automated interaction with software, web applications, and operating systems.
23035+
23036+
**Key Features**
23037+
- **Base Model**: Built upon **Qwen3-VL-30B-A3B-Instruct**, a powerful multimodal LLM with strong reasoning and vision capabilities.
23038+
- **Specialized Training**: Fine-tuned using data curation and reinforcement learning to achieve superior grounding accuracy.
23039+
- **High Accuracy**: Achieves **63.88% on ScreenSpot-Pro** and **73.40% on OS-World-G**, outperforming prior specialized models like GTA1-32B and even larger VLMs such as Qwen3-VL-235B.
23040+
- **Efficient Inference**: Activated size of only **3.3 GB**, enabling efficient deployment on consumer hardware.
23041+
- **Open Source & Free**: Fully open-access under the Apache 2.0 license with full training code and datasets available.
23042+
23043+
**Use Cases**
23044+
- Automating repetitive GUI interactions (e.g., form filling, software navigation)
23045+
- Building AI agents for desktop and web automation
23046+
- Research in computer-use agent behavior and human-AI collaboration
23047+
23048+
**Inference Example**
23049+
Given a screen image and a natural language instruction like *"Reload the cache"*, Gelato outputs precise (x,y) coordinates of the target UI element—enabling accurate mouse clicks or touch actions.
23050+
23051+
**Model Link**
23052+
👉 [View on Hugging Face: mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B)
23053+
23054+
**Ideal For**
23055+
Developers, AI researchers, and automation engineers seeking a lightweight, high-accuracy model for GUI interaction and agent-based tasks.
23056+
*Bonus*: When paired with GPT-5, it enables frontier-level agentic performance on OS-World.
23057+
overrides:
23058+
parameters:
23059+
model: Gelato-30B-A3B.i1-Q4_K_M.gguf
23060+
files:
23061+
- filename: Gelato-30B-A3B.i1-Q4_K_M.gguf
23062+
sha256: b353b25d0e193340dbf68261d930f5456adb2933a85d74be5296757d85337f45
23063+
uri: huggingface://mradermacher/Gelato-30B-A3B-i1-GGUF/Gelato-30B-A3B.i1-Q4_K_M.gguf

0 commit comments

Comments
 (0)