Files
openclaw-backups/skills/vibesurf/references/browser-use.md

108 lines
4.1 KiB
Markdown

---
name: browser-use
description: Use when task requires multiple steps, unknown UI, form filling, or parallel automation across multiple tabs. This launches autonomous AI agents that figure out the steps themselves.
---
# Browser-Use Agent - Autonomous Automation
## Overview
Launch AI sub-agents that complete multi-step browser tasks autonomously. **Most powerful VibeSurf capability.**
## When to Use
**browser-use is a high-level, task-oriented sub-agent approach:**
- Complex tasks where you describe the **goal** and desired **output**, let the agent figure out the steps
- Long workflows that would require many manual browser operations
- Unknown or dynamic UI that needs autonomous exploration
- Parallel automation across multiple tabs
**Note:** browser-use and `browser` skill are complementary, not mutually exclusive. Both can accomplish the same tasks - browser-use is higher-level automation, while `browser` gives you precise control.
## Available Actions
| Action | Description |
|--------|-------------|
| `execute_browser_use_agent` | Execute browser-use agent tasks. Specify tab_id to work on specific tab. Each tab_id must be unique during parallel execution. |
## How It Works
Describe the **goal**, agent figures out the **steps**:
- Navigate to URLs
- Find and interact with elements
- Fill forms
- Extract data
- Return structured results
## Task-Oriented Thinking
**Good task descriptions:**
- ✅ "Fill out the registration form with these details"
- ✅ "Search for Python tutorials and summarize top 3"
- ✅ "Go to login page, authenticate, then check dashboard"
**Bad task descriptions:**
- ❌ "Click button" (too vague, use `browser`)
- ❌ "Extract prices" (use `js_code` instead)
- ❌ "Step 1: navigate, Step 2: click..." (let agent figure it out)
## Working with Existing Tabs
> **🎯 Important: tab_id Selection**
>
> **When user refers to their already-opened pages** (e.g., "the current page", "from my open tabs", "the second tab"):
>
> 1. **FIRST** call `get_browser_state` to get all open tabs and their IDs
> 2. **THEN** use the correct `tab_id` from the response
> 3. **NEVER** use `tab_id: null` or omit it - that creates a NEW tab
>
> **Key distinction:**
> - `tab_id: "existing_id"` → Work on user's existing tab
> - `tab_id: null` or omitted → Create a brand new tab
## Parallel Execution
Provide multiple tasks to run agents in parallel. Each task needs a unique `tab_id` for parallel execution.
## Best Practices
| Practice | Why |
|----------|-----|
| Describe goal, not steps | Agent figures out navigation |
| Use parallel for independent tasks | Much faster |
| One task per agent | Clear responsibilities |
| Unique tab_id per task | Required for parallel execution |
## Common Mistakes
| Mistake | Fix |
|---------|-----|
| Over-specifying steps | Describe goal, let agent figure it out |
| Using for single click | Use `browser` instead |
| Using for simple extraction | Use `js_code` or `crawl` instead |
| Duplicate tab_id in parallel | Each agent needs unique tab_id |
## Fallback Strategy
> **🔄 When browser-use Fails or Needs Manual Control**
>
> If `execute_browser_use_agent` fails, gets stuck, or you need more precise control:
>
> **Seamlessly fallback to manual `browser` operations:**
> 1. `get_browser_state` - Inspect current page state and available elements
> 2. `browser.{action}` - Perform specific action (click, input, navigate, etc.)
> 3. `get_browser_state` - Verify result and determine next action
> 4. Repeat this cycle until task completes
>
> **This is the recommended recovery pattern** - browser-use and browser are complementary tools.
## Choosing the Right Approach
| Approach | Best For | Characteristics |
|----------|----------|-----------------|
| **browser-use** | Complex, long tasks | Task-oriented, autonomous, describe goal + output |
| **browser** | Precise control needed | Step-by-step, explicit actions, full control |
| **Hybrid** | Best of both | Start with browser-use, fallback to browser if needed |
**Principle:** Choose based on task complexity and control needs, not step count. Both can handle multi-step workflows and form filling.