Skip to content

Commit 5a01fe4

Browse files
author
rebootyang
committed
chore: update port configuration and improve connection logic
- Change default port from 5000 to 5001 to avoid conflicts with macOS services (e.g., AirTunes) - Integrate new DroidAgent config API using config_manager - Improve device connection logic to properly handle emulators and network devices - Add timeout handling and error handling mechanisms - Support custom API base URL configuration via --api-base flag - Update README documentation with additional configuration options and examples - Update dependency lock file
1 parent 6733370 commit 5a01fe4

File tree

8 files changed

+2709
-2445
lines changed

8 files changed

+2709
-2445
lines changed

README.md

Lines changed: 61 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ droidworld check
9393
Execute from `droidrun-android-world` directory:
9494
```bash
9595
# Example: add contact task
96-
droidworld run --tasks ContactsAddContact
96+
droidworld run --task ContactsAddContact
9797
```
9898

9999
---
@@ -179,6 +179,12 @@ Run a specific task by name:
179179
droidworld run --task ContactsAddContact
180180
```
181181

182+
Run multiple specific tasks:
183+
184+
```bash
185+
droidworld run --task ContactsAddContact --task ContactsDeleteContact
186+
```
187+
182188
### List Available Tasks
183189

184190
View all available tasks with their IDs:
@@ -189,17 +195,66 @@ droidworld list-tasks
189195

190196
### Customizing the Benchmark
191197

198+
#### LLM Provider Configuration
199+
192200
```bash
193-
# Run with a different LLM provider and model
194-
droidworld run --llm-provider Anthropic --llm-model claude-3-sonnet-20240229
201+
# Use Anthropic Claude
202+
droidworld run --task ContactsAddContact \
203+
--llm-provider Anthropic \
204+
--llm-model claude-3-sonnet-20240229
205+
206+
# Use OpenAI-compatible API (e.g., third-party proxy)
207+
droidworld run --task ContactsAddContact \
208+
--llm-provider OpenAILike \
209+
--llm-model gemini-2.5-pro \
210+
--api-base http://your-api-endpoint/v1
211+
212+
# Enable vision and reasoning modes
213+
droidworld run --task ContactsAddContact \
214+
--vision \
215+
--reasoning
216+
```
217+
218+
#### Task Family Selection
219+
220+
Choose from different task families:
221+
- `android_world` (default): Full Android World task suite
222+
- `android`: Android-specific tasks
223+
- `miniwob`: MiniWoB tasks
224+
- `information_retrieval`: Information retrieval tasks
195225

226+
```bash
227+
droidworld run --task-family android --min-task-idx 0 --max-task-idx 5
228+
```
229+
230+
#### Performance Tuning
231+
232+
```bash
196233
# Set maximum steps per task: multiplier * task complexity
197-
droidworld run --max-step-multiplier 15
234+
droidworld run --task ContactsAddContact --max-steps-multiplier 15
235+
236+
# Set timeout: multiplier (in seconds) per task
237+
droidworld run --task ContactsAddContact --timeout-multiplier 300
238+
239+
# Adjust LLM temperature
240+
droidworld run --task ContactsAddContact --temperature 0.7
241+
```
242+
243+
#### Advanced Options
198244

245+
```bash
199246
# Run multiple parameter combinations per task
200-
droidworld run --n-task-combinations 3
247+
droidworld run --task ContactsAddContact --n-task-combinations 3
248+
249+
# Enable debug mode and tracing
250+
droidworld run --task ContactsAddContact --debug --tracing
251+
252+
# Use custom environment URL and device serial
253+
droidworld run --task ContactsAddContact \
254+
--env-url http://localhost:5001 \
255+
--env-serial emulator-5554
201256

202-
# Check all available configuration options with
257+
# Check all available configuration options
203258
droidworld run --help
204259
```
205260

droidrun

Submodule droidrun updated 149 files

eval/cli.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ def version():
6767
@cli.command()
6868
@click.option(
6969
"--env-url",
70-
default="http://localhost:5000",
70+
default="http://localhost:5001",
7171
help="Android World Environment URL to use.",
7272
)
7373
def list_tasks(env_url):
@@ -81,7 +81,7 @@ def list_tasks(env_url):
8181
@cli.command()
8282
@click.option(
8383
"--env-url",
84-
default="http://localhost:5000",
84+
default="http://localhost:5001",
8585
help="Android World Environment URL to use.",
8686
)
8787
@click.option("--env-serial", default="emulator-5554", help="Device serial to use.")
@@ -110,7 +110,7 @@ def disable_overlay(env_serial):
110110
@cli.command()
111111
@click.option(
112112
"--env-url",
113-
default="http://localhost:5000",
113+
default="http://localhost:5001",
114114
help="Android World Environment URL to use.",
115115
)
116116
@click.option("--env-serial", default="emulator-5554", help="Device serial to use.")
@@ -124,6 +124,7 @@ def disable_overlay(env_serial):
124124
)
125125
@click.option("--llm-provider", default="Gemini", help="LLM provider to use.")
126126
@click.option("--llm-model", default="gemini-2.5-pro", help="LLM model to use.")
127+
@click.option("--api-base", default=None, help="Base URL for API (e.g., OpenAI-compatible API).")
127128
@click.option("--vision", is_flag=True, help="Enable vision.")
128129
@click.option("--reasoning", is_flag=True, help="Enable reasoning.")
129130
@click.option("--reflection", is_flag=True, help="Enable reflection.")
@@ -144,6 +145,7 @@ async def run(
144145
n_task_combinations,
145146
llm_provider,
146147
llm_model,
148+
api_base,
147149
vision,
148150
reasoning,
149151
reflection,
@@ -185,7 +187,11 @@ async def run(
185187
logger.info(f"Found tasks: {', '.join(task_list)} ({len(task_list)})")
186188

187189
logger.debug(f"Loading LLM: {llm_provider} {llm_model} {temperature}")
188-
llm = load_llm(llm_provider, model=llm_model, temperature=temperature)
190+
llm_kwargs = {"model": llm_model, "temperature": temperature}
191+
if api_base:
192+
llm_kwargs["api_base"] = api_base
193+
logger.debug(f"Using custom API base: {api_base}")
194+
llm = load_llm(llm_provider, **llm_kwargs)
189195
logger.debug("LLM loaded successfully")
190196

191197
for task_name in task_list:

eval/env/boot.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,20 @@
2424

2525
def ensure_connected(serial: str) -> AdbDevice:
2626
try:
27-
res = adb.connect(serial)
28-
if res.count("failed") > 0 or res.count("unable") > 0:
29-
raise res
27+
# For emulator devices (emulator-*), they're already connected locally
28+
# Only try network connect for IP addresses
29+
if ":" in serial or not serial.startswith("emulator-"):
30+
res = adb.connect(serial)
31+
if res.count("failed") > 0 or res.count("unable") > 0:
32+
raise RuntimeError(f"Failed to connect: {res}")
33+
34+
# Verify device is available
35+
device = adb.device(serial)
36+
# Test if device is accessible
37+
device.shell("echo test")
38+
return device
3039
except Exception as e:
3140
raise RuntimeError(f"Device {serial} is not connected: {e}")
32-
33-
return adb.device(serial)
3441

3542

3643
def install_portal(device: AdbDevice):

eval/env/client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def parse_element(data: dict[str, Any]) -> representation_utils.UIElement:
6161
class AndroidEnvClient:
6262
"""Client for interacting with the Android environment server."""
6363

64-
def __init__(self, base_url: str = "http://localhost:5000"):
64+
def __init__(self, base_url: str = "http://localhost:5001"):
6565
logger.info(
6666
"Setting up Android environment using Docker - Initial setup may take"
6767
" 5-10 minutes. Please wait..."

eval/runner.py

Lines changed: 46 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -57,18 +57,41 @@ async def run_task_on_env(
5757
)
5858

5959
tools = AndroidWorldTools(device_serial, env)
60+
61+
# Import new config classes for DroidAgent
62+
from droidrun.config_manager.config_manager import (
63+
DroidrunConfig,
64+
AgentConfig,
65+
DeviceConfig,
66+
LoggingConfig,
67+
TracingConfig,
68+
ManagerConfig,
69+
ExecutorConfig,
70+
CodeActConfig,
71+
)
72+
73+
# Build config for new DroidAgent API
74+
agent_config = AgentConfig(
75+
reasoning=reasoning,
76+
max_steps=max_steps,
77+
manager=ManagerConfig(vision=vision),
78+
executor=ExecutorConfig(vision=vision),
79+
codeact=CodeActConfig(vision=vision),
80+
)
81+
82+
config = DroidrunConfig(
83+
agent=agent_config,
84+
device=DeviceConfig(),
85+
logging=LoggingConfig(debug=debug, save_trajectory="none"),
86+
tracing=TracingConfig(enabled=tracing),
87+
)
88+
6089
agent = DroidAgent(
6190
goal=task_goal,
62-
llm=llm,
91+
config=config,
92+
llms=llm, # New API accepts single LLM for all agents
6393
tools=tools,
64-
reasoning=reasoning,
65-
enable_tracing=tracing,
66-
debug=debug,
67-
max_steps=max_steps,
6894
timeout=timeout,
69-
save_trajectories="none",
70-
reflection=reflection,
71-
vision=vision,
7295
)
7396

7497
logger.debug("DroidAgent initialized successfully")
@@ -95,15 +118,25 @@ async def run_task_on_env(
95118
logger.warn(f"Droidrun timed out for task {task_name} {task_idx}: {e}")
96119
score = env.get_task_score(task_name, task_idx)
97120
logger.info(f"Task {task_name} {task_idx} score: {score}")
121+
122+
# Create a simple result object for timeout
123+
class TimeoutResult:
124+
def __init__(self):
125+
self.success = False
126+
self.reason = f"Timeout after {timeout} seconds"
127+
# Handle both old and new API for step counter
128+
if hasattr(agent, 'step_counter'):
129+
self.steps = agent.step_counter
130+
elif hasattr(agent, 'shared_state') and hasattr(agent.shared_state, 'step_number'):
131+
self.steps = agent.shared_state.step_number
132+
else:
133+
self.steps = 0
134+
98135
result = get_task_result(
99136
task_result,
100137
agent,
101138
score=score,
102-
agent_result={
103-
"steps": agent.step_counter,
104-
"success": False,
105-
"reason": f"Timeout after {timeout} seconds",
106-
},
139+
agent_result=TimeoutResult(),
107140
device=device_serial,
108141
)
109142
except Exception as e:

eval/tracker.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,17 @@ def get_task_result(
111111
task_result.success = score
112112

113113
if agent_result is not None:
114-
task_result.agent_success = agent_result["success"]
115-
task_result.steps_taken = agent_result["steps"]
116-
task_result.final_thought = agent_result["reason"]
114+
# Handle both old dict format and new ResultEvent format
115+
if hasattr(agent_result, 'success'):
116+
# New API: ResultEvent object
117+
task_result.agent_success = agent_result.success
118+
task_result.steps_taken = agent_result.steps
119+
task_result.final_thought = agent_result.reason
120+
else:
121+
# Old API: dict
122+
task_result.agent_success = agent_result["success"]
123+
task_result.steps_taken = agent_result["steps"]
124+
task_result.final_thought = agent_result["reason"]
117125

118126
if error is not None:
119127
task_result.error = error
@@ -126,7 +134,14 @@ def get_task_result(
126134
task_result.trajectory_stats = TrajectoryStats(
127135
**get_trajectory_statistics(task_result.trajectory)
128136
)
129-
task_result.reasoning = agent.reasoning
137+
# Handle both old and new API for reasoning attribute
138+
if hasattr(agent, 'reasoning'):
139+
task_result.reasoning = agent.reasoning
140+
elif hasattr(agent, 'config') and hasattr(agent.config, 'agent'):
141+
task_result.reasoning = agent.config.agent.reasoning
142+
else:
143+
task_result.reasoning = False
144+
130145
if device is not None:
131146
task_result.device = device
132147

0 commit comments

Comments
 (0)