|
23023 | 23023 | - filename: Evilmind-24B-v1.i1-Q4_K_M.gguf |
23024 | 23024 | sha256: 22e56c86b4f4a8f7eb3269f72a6bb0f06a7257ff733e21063fdec6691a52177d |
23025 | 23025 | uri: huggingface://mradermacher/Evilmind-24B-v1-i1-GGUF/Evilmind-24B-v1.i1-Q4_K_M.gguf |
| 23026 | +- !!merge <<: *qwen3vl |
| 23027 | + name: "gelato-30b-a3b-i1" |
| 23028 | + urls: |
| 23029 | + - https://huggingface.co/mradermacher/Gelato-30B-A3B-i1-GGUF |
| 23030 | + description: | |
| 23031 | + **Model Name:** Gelato-30B-A3B |
| 23032 | + **Base Model:** Qwen3-VL-30B-A3B-Instruct |
| 23033 | + **Repository:** [mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B) |
| 23034 | + **Type:** Vision-Language Model (VLM) for GUI Grounding |
| 23035 | + **License:** Apache 2.0 |
| 23036 | + **Size:** 30B parameters (activated size: ~3.3B) |
| 23037 | + |
| 23038 | + **Description:** |
| 23039 | + Gelato-30B-A3B is a state-of-the-art vision-language model designed specifically for grounding tasks in graphical user interfaces (GUIs). Trained on the open-source **Click-100k** dataset, it achieves **63.88% accuracy on ScreenSpot-Pro** and **73.40% on OS-World-G**, outperforming larger models like Qwen3-VL-235B and specialized agents such as GTA1-32B. |
| 23040 | + |
| 23041 | + Built on the Qwen3-VL-30B-A3B-Instruct foundation, Gelato excels at understanding user instructions and locating UI elements in screenshots with high precision—outputting normalized (x, y) coordinates in the range [0, 1000]. It is ideal for use in agentic systems, automation pipelines, and computer-use AI assistants. |
| 23042 | + |
| 23043 | + **Key Features:** |
| 23044 | + - Optimized for real-world GUI interaction tasks |
| 23045 | + - High accuracy despite moderate size (30B total, 3.3B activated) |
| 23046 | + - Open-source and compatible with Hugging Face Transformers |
| 23047 | + - Supports multimodal input (image + text) |
| 23048 | + - Designed for zero-shot object detection in screen interfaces |
| 23049 | + |
| 23050 | + **Use Case:** |
| 23051 | + Perfect for building AI agents that interact with desktop or mobile UIs, such as automated testing, assistive technology, or interactive screen navigation. |
| 23052 | + |
| 23053 | + **Inference Example:** |
| 23054 | + Given a screenshot and instruction like *"Reload the cache"*, Gelato predicts the exact UI element to click—ideal for integrating into end-to-end agentic workflows. |
| 23055 | + |
| 23056 | + 👉 **Try it out**: [mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B) |
| 23057 | + 📊 **Benchmark Results**: [Evaluation Details](./evaluation) |
| 23058 | + 📁 **Dataset**: [Click-100k](https://huggingface.co/datasets/mlfoundations/clicks-100k) |
| 23059 | + overrides: |
| 23060 | + parameters: |
| 23061 | + model: Gelato-30B-A3B.i1-Q4_K_M.gguf |
| 23062 | + files: |
| 23063 | + - filename: Gelato-30B-A3B.i1-Q4_K_M.gguf |
| 23064 | + sha256: b353b25d0e193340dbf68261d930f5456adb2933a85d74be5296757d85337f45 |
| 23065 | + uri: huggingface://mradermacher/Gelato-30B-A3B-i1-GGUF/Gelato-30B-A3B.i1-Q4_K_M.gguf |
0 commit comments