|
23023 | 23023 | - filename: Evilmind-24B-v1.i1-Q4_K_M.gguf |
23024 | 23024 | sha256: 22e56c86b4f4a8f7eb3269f72a6bb0f06a7257ff733e21063fdec6691a52177d |
23025 | 23025 | uri: huggingface://mradermacher/Evilmind-24B-v1-i1-GGUF/Evilmind-24B-v1.i1-Q4_K_M.gguf |
| 23026 | +- !!merge <<: *llava |
| 23027 | + name: "gelato-30b-a3b-i1" |
| 23028 | + urls: |
| 23029 | + - https://huggingface.co/mradermacher/Gelato-30B-A3B-i1-GGUF |
| 23030 | + description: | |
| 23031 | + ### 🍨 Gelato-30B-A3B – A State-of-the-Art Vision-Language Model for GUI Grounding |
| 23032 | + |
| 23033 | + **Overview** |
| 23034 | + Gelato-30B-A3B is a high-performance, open-source vision-language model (VLM) specifically designed for computer-use agent tasks. Trained on the large-scale **Click-100k** dataset, it excels at locating UI elements in graphical user interfaces (GUIs), making it ideal for automated interaction with software, web applications, and operating systems. |
| 23035 | + |
| 23036 | + **Key Features** |
| 23037 | + - **Base Model**: Built upon **Qwen3-VL-30B-A3B-Instruct**, a powerful multimodal LLM with strong reasoning and vision capabilities. |
| 23038 | + - **Specialized Training**: Fine-tuned using data curation and reinforcement learning to achieve superior grounding accuracy. |
| 23039 | + - **High Accuracy**: Achieves **63.88% on ScreenSpot-Pro** and **73.40% on OS-World-G**, outperforming prior specialized models like GTA1-32B and even larger VLMs such as Qwen3-VL-235B. |
| 23040 | + - **Efficient Inference**: Activated size of only **3.3 GB**, enabling efficient deployment on consumer hardware. |
| 23041 | + - **Open Source & Free**: Fully open-access under the Apache 2.0 license with full training code and datasets available. |
| 23042 | + |
| 23043 | + **Use Cases** |
| 23044 | + - Automating repetitive GUI interactions (e.g., form filling, software navigation) |
| 23045 | + - Building AI agents for desktop and web automation |
| 23046 | + - Research in computer-use agent behavior and human-AI collaboration |
| 23047 | + |
| 23048 | + **Inference Example** |
| 23049 | + Given a screen image and a natural language instruction like *"Reload the cache"*, Gelato outputs precise (x,y) coordinates of the target UI element—enabling accurate mouse clicks or touch actions. |
| 23050 | + |
| 23051 | + **Model Link** |
| 23052 | + 👉 [View on Hugging Face: mlfoundations-cua-dev/Gelato-30B-A3B](https://huggingface.co/mlfoundations-cua-dev/Gelato-30B-A3B) |
| 23053 | + |
| 23054 | + **Ideal For** |
| 23055 | + Developers, AI researchers, and automation engineers seeking a lightweight, high-accuracy model for GUI interaction and agent-based tasks. |
| 23056 | + *Bonus*: When paired with GPT-5, it enables frontier-level agentic performance on OS-World. |
| 23057 | + overrides: |
| 23058 | + parameters: |
| 23059 | + model: Gelato-30B-A3B.i1-Q4_K_M.gguf |
| 23060 | + files: |
| 23061 | + - filename: Gelato-30B-A3B.i1-Q4_K_M.gguf |
| 23062 | + sha256: b353b25d0e193340dbf68261d930f5456adb2933a85d74be5296757d85337f45 |
| 23063 | + uri: huggingface://mradermacher/Gelato-30B-A3B-i1-GGUF/Gelato-30B-A3B.i1-Q4_K_M.gguf |
0 commit comments