4.1 KiB
name, description
| name | description |
|---|---|
| browser-use | Use when task requires multiple steps, unknown UI, form filling, or parallel automation across multiple tabs. This launches autonomous AI agents that figure out the steps themselves. |
Browser-Use Agent - Autonomous Automation
Overview
Launch AI sub-agents that complete multi-step browser tasks autonomously. Most powerful VibeSurf capability.
When to Use
browser-use is a high-level, task-oriented sub-agent approach:
- Complex tasks where you describe the goal and desired output, let the agent figure out the steps
- Long workflows that would require many manual browser operations
- Unknown or dynamic UI that needs autonomous exploration
- Parallel automation across multiple tabs
Note: browser-use and browser skill are complementary, not mutually exclusive. Both can accomplish the same tasks - browser-use is higher-level automation, while browser gives you precise control.
Available Actions
| Action | Description |
|---|---|
execute_browser_use_agent |
Execute browser-use agent tasks. Specify tab_id to work on specific tab. Each tab_id must be unique during parallel execution. |
How It Works
Describe the goal, agent figures out the steps:
- Navigate to URLs
- Find and interact with elements
- Fill forms
- Extract data
- Return structured results
Task-Oriented Thinking
Good task descriptions:
- ✅ "Fill out the registration form with these details"
- ✅ "Search for Python tutorials and summarize top 3"
- ✅ "Go to login page, authenticate, then check dashboard"
Bad task descriptions:
- ❌ "Click button" (too vague, use
browser) - ❌ "Extract prices" (use
js_codeinstead) - ❌ "Step 1: navigate, Step 2: click..." (let agent figure it out)
Working with Existing Tabs
🎯 Important: tab_id Selection
When user refers to their already-opened pages (e.g., "the current page", "from my open tabs", "the second tab"):
- FIRST call
get_browser_stateto get all open tabs and their IDs- THEN use the correct
tab_idfrom the response- NEVER use
tab_id: nullor omit it - that creates a NEW tabKey distinction:
tab_id: "existing_id"→ Work on user's existing tabtab_id: nullor omitted → Create a brand new tab
Parallel Execution
Provide multiple tasks to run agents in parallel. Each task needs a unique tab_id for parallel execution.
Best Practices
| Practice | Why |
|---|---|
| Describe goal, not steps | Agent figures out navigation |
| Use parallel for independent tasks | Much faster |
| One task per agent | Clear responsibilities |
| Unique tab_id per task | Required for parallel execution |
Common Mistakes
| Mistake | Fix |
|---|---|
| Over-specifying steps | Describe goal, let agent figure it out |
| Using for single click | Use browser instead |
| Using for simple extraction | Use js_code or crawl instead |
| Duplicate tab_id in parallel | Each agent needs unique tab_id |
Fallback Strategy
🔄 When browser-use Fails or Needs Manual Control
If
execute_browser_use_agentfails, gets stuck, or you need more precise control:Seamlessly fallback to manual
browseroperations:
get_browser_state- Inspect current page state and available elementsbrowser.{action}- Perform specific action (click, input, navigate, etc.)get_browser_state- Verify result and determine next action- Repeat this cycle until task completes
This is the recommended recovery pattern - browser-use and browser are complementary tools.
Choosing the Right Approach
| Approach | Best For | Characteristics |
|---|---|---|
| browser-use | Complex, long tasks | Task-oriented, autonomous, describe goal + output |
| browser | Precise control needed | Step-by-step, explicit actions, full control |
| Hybrid | Best of both | Start with browser-use, fallback to browser if needed |
Principle: Choose based on task complexity and control needs, not step count. Both can handle multi-step workflows and form filling.