Skip to content

Commit 1d0961c

Browse files
mudlergithub-actions[bot]
authored andcommitted
chore(model gallery): 🤖 add new models via gallery agent
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 3a40b41 commit 1d0961c

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

gallery/index.yaml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23023,3 +23023,43 @@
2302323023
- filename: Evilmind-24B-v1.i1-Q4_K_M.gguf
2302423024
sha256: 22e56c86b4f4a8f7eb3269f72a6bb0f06a7257ff733e21063fdec6691a52177d
2302523025
uri: huggingface://mradermacher/Evilmind-24B-v1-i1-GGUF/Evilmind-24B-v1.i1-Q4_K_M.gguf
23026+
- !!merge <<: *qwen3vl
23027+
name: "gelato-30b-a3b-i1"
23028+
urls:
23029+
- https://huggingface.co/mradermacher/Gelato-30B-A3B-i1-GGUF
23030+
description: |
23031+
**Model Name:** Gelato-30B-A3B
23032+
**Base Model:** Qwen3-VL-30B-A3B-Instruct
23033+
**Repository:** [mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B)
23034+
**Type:** Vision-Language Model (VLM) for GUI Grounding
23035+
**License:** Apache 2.0
23036+
**Size:** 30B parameters (activated size: ~3.3B)
23037+
23038+
**Description:**
23039+
Gelato-30B-A3B is a state-of-the-art vision-language model designed specifically for grounding tasks in graphical user interfaces (GUIs). Trained on the open-source **Click-100k** dataset, it achieves **63.88% accuracy on ScreenSpot-Pro** and **73.40% on OS-World-G**, outperforming larger models like Qwen3-VL-235B and specialized agents such as GTA1-32B.
23040+
23041+
Built on the Qwen3-VL-30B-A3B-Instruct foundation, Gelato excels at understanding user instructions and locating UI elements in screenshots with high precision—outputting normalized (x, y) coordinates in the range [0, 1000]. It is ideal for use in agentic systems, automation pipelines, and computer-use AI assistants.
23042+
23043+
**Key Features:**
23044+
- Optimized for real-world GUI interaction tasks
23045+
- High accuracy despite moderate size (30B total, 3.3B activated)
23046+
- Open-source and compatible with Hugging Face Transformers
23047+
- Supports multimodal input (image + text)
23048+
- Designed for zero-shot object detection in screen interfaces
23049+
23050+
**Use Case:**
23051+
Perfect for building AI agents that interact with desktop or mobile UIs, such as automated testing, assistive technology, or interactive screen navigation.
23052+
23053+
**Inference Example:**
23054+
Given a screenshot and instruction like *"Reload the cache"*, Gelato predicts the exact UI element to click—ideal for integrating into end-to-end agentic workflows.
23055+
23056+
👉 **Try it out**: [mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B)
23057+
📊 **Benchmark Results**: [Evaluation Details](./evaluation)
23058+
📁 **Dataset**: [Click-100k](https://huggingface.co/datasets/mlfoundations/clicks-100k)
23059+
overrides:
23060+
parameters:
23061+
model: Gelato-30B-A3B.i1-Q4_K_M.gguf
23062+
files:
23063+
- filename: Gelato-30B-A3B.i1-Q4_K_M.gguf
23064+
sha256: b353b25d0e193340dbf68261d930f5456adb2933a85d74be5296757d85337f45
23065+
uri: huggingface://mradermacher/Gelato-30B-A3B-i1-GGUF/Gelato-30B-A3B.i1-Q4_K_M.gguf

0 commit comments

Comments
 (0)