feat: Add MVP Demo for LLM UI Action Planning by abrichr · Pull Request #14 · OpenAdaptAI/OmniMCP

abrichr · 2025-03-28T02:40:26Z

Description:

This PR introduces the initial Minimum Viable Product (MVP) demo for planning UI actions using an LLM.

Key Features:

Synthetic UI: Generates a sample UI (synthetic_ui.py) to mock visual parser output for rapid prototyping.
LLM Planning: Takes the synthetic UI elements and a user goal, prompts an LLM (Anthropic via completions.py), and gets a structured action plan (core.py using Pydantic). The plan includes reasoning, action type, target element ID, and text-to-type.
Visualization: Highlights the LLM-chosen target element on the synthetic UI image.
Runnable Demo: Includes a top-level demo.py script to execute the flow.

Purpose:

Provides a working baseline demonstrating the core concept of LLM-driven UI action planning based on visual context.

To Run:

Ensure ANTHROPIC_API_KEY is set in your environment or .env file.
Run python demo.py.
Check the demo_output/ directory for generated images.

Introduces a runnable demo (`demo.py`) proving the concept of using an LLM to plan a UI action based on a user goal and mocked visual elements (generated by `synthetic_ui.py`). Includes: - Core planning logic and prompting (`core.py`) - Anthropic API integration (`completions.py`) - Pydantic structured output for LLM response - Visualization of the target UI element

Updates the demo script to loop through planning and simulating actions on the synthetic UI (type, click). Includes goal completion check via LLM.

abrichr · 2025-03-29T19:33:08Z

This gives us a solid baseline with the multi-step demo, tests, and CI.

The next logical step is definitely moving towards interacting with real UIs. Thinking about what's next, the main paths seem to be:

Integrate Real OmniParser:
- Replace synthetic_ui.py with actual screen capture (utils.take_screenshot).
- Get the omniparser/client.py working to call a running OmniParser instance (needs URL config).
- Map the OmniParser JSON response back to our List[UIElement].
Implement Real Action Execution:
- Use the MouseController/KeyboardController to actually perform the planned click/type.
- Handle coordinate conversion (normalized bounds -> absolute screen pixels).
- Map the LLMActionPlan (action, text_to_type) to controller calls.
- Replace the simulate_action call in the loop.
Refine LLM Interaction / Error Handling:
- Improve prompt robustness for real UIs.
- Handle LLM errors (bad JSON, invalid element IDs).
- Handle action execution errors.

Suggest we tackle #1 (Real OmniParser integration) first. We need the system to perceive the real UI state before we can reliably plan and execute actions on it.

abrichr added 11 commits March 27, 2025 22:37

add demo_output/login_screen.png, login_screen_highlighted.png

e95203a

feat(demo): Add multi-step planning and simulation

bbd4ecf

Updates the demo script to loop through planning and simulating actions on the synthetic UI (type, click). Includes goal completion check via LLM.

add demo_output_multistep

4b01bb0

feat(demo): Add dimming and text annotation to highlights

2a936e5

Add tests; ci.yml

aaf7887

ruff

af59011

ci.yml: uv venv

d8b7bbb

pyproject.toml: Only include the main package source directory

5c8a004

uv add ruff

36a1f87

ruff

c7f195e

abrichr force-pushed the feat/demo branch from 74bf11e to c7f195e Compare March 29, 2025 19:28

abrichr merged commit bc1acbc into main Mar 29, 2025
1 check passed

abrichr deleted the feat/demo branch March 29, 2025 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add MVP Demo for LLM UI Action Planning#14

feat: Add MVP Demo for LLM UI Action Planning#14
abrichr merged 11 commits intomainfrom
feat/demo

abrichr commented Mar 28, 2025

Uh oh!

Uh oh!

abrichr commented Mar 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 28, 2025

Uh oh!

Uh oh!

abrichr commented Mar 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant