AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning
This commit is contained in:
107
skills/vibesurf/references/browser-use.md
Normal file
107
skills/vibesurf/references/browser-use.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
name: browser-use
|
||||
description: Use when task requires multiple steps, unknown UI, form filling, or parallel automation across multiple tabs. This launches autonomous AI agents that figure out the steps themselves.
|
||||
---
|
||||
|
||||
# Browser-Use Agent - Autonomous Automation
|
||||
|
||||
## Overview
|
||||
|
||||
Launch AI sub-agents that complete multi-step browser tasks autonomously. **Most powerful VibeSurf capability.**
|
||||
|
||||
## When to Use
|
||||
|
||||
**browser-use is a high-level, task-oriented sub-agent approach:**
|
||||
- Complex tasks where you describe the **goal** and desired **output**, let the agent figure out the steps
|
||||
- Long workflows that would require many manual browser operations
|
||||
- Unknown or dynamic UI that needs autonomous exploration
|
||||
- Parallel automation across multiple tabs
|
||||
|
||||
**Note:** browser-use and `browser` skill are complementary, not mutually exclusive. Both can accomplish the same tasks - browser-use is higher-level automation, while `browser` gives you precise control.
|
||||
|
||||
## Available Actions
|
||||
|
||||
| Action | Description |
|
||||
|--------|-------------|
|
||||
| `execute_browser_use_agent` | Execute browser-use agent tasks. Specify tab_id to work on specific tab. Each tab_id must be unique during parallel execution. |
|
||||
|
||||
## How It Works
|
||||
|
||||
Describe the **goal**, agent figures out the **steps**:
|
||||
- Navigate to URLs
|
||||
- Find and interact with elements
|
||||
- Fill forms
|
||||
- Extract data
|
||||
- Return structured results
|
||||
|
||||
## Task-Oriented Thinking
|
||||
|
||||
**Good task descriptions:**
|
||||
- ✅ "Fill out the registration form with these details"
|
||||
- ✅ "Search for Python tutorials and summarize top 3"
|
||||
- ✅ "Go to login page, authenticate, then check dashboard"
|
||||
|
||||
**Bad task descriptions:**
|
||||
- ❌ "Click button" (too vague, use `browser`)
|
||||
- ❌ "Extract prices" (use `js_code` instead)
|
||||
- ❌ "Step 1: navigate, Step 2: click..." (let agent figure it out)
|
||||
|
||||
## Working with Existing Tabs
|
||||
|
||||
> **🎯 Important: tab_id Selection**
|
||||
>
|
||||
> **When user refers to their already-opened pages** (e.g., "the current page", "from my open tabs", "the second tab"):
|
||||
>
|
||||
> 1. **FIRST** call `get_browser_state` to get all open tabs and their IDs
|
||||
> 2. **THEN** use the correct `tab_id` from the response
|
||||
> 3. **NEVER** use `tab_id: null` or omit it - that creates a NEW tab
|
||||
>
|
||||
> **Key distinction:**
|
||||
> - `tab_id: "existing_id"` → Work on user's existing tab
|
||||
> - `tab_id: null` or omitted → Create a brand new tab
|
||||
|
||||
## Parallel Execution
|
||||
|
||||
Provide multiple tasks to run agents in parallel. Each task needs a unique `tab_id` for parallel execution.
|
||||
|
||||
## Best Practices
|
||||
|
||||
| Practice | Why |
|
||||
|----------|-----|
|
||||
| Describe goal, not steps | Agent figures out navigation |
|
||||
| Use parallel for independent tasks | Much faster |
|
||||
| One task per agent | Clear responsibilities |
|
||||
| Unique tab_id per task | Required for parallel execution |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
| Mistake | Fix |
|
||||
|---------|-----|
|
||||
| Over-specifying steps | Describe goal, let agent figure it out |
|
||||
| Using for single click | Use `browser` instead |
|
||||
| Using for simple extraction | Use `js_code` or `crawl` instead |
|
||||
| Duplicate tab_id in parallel | Each agent needs unique tab_id |
|
||||
|
||||
## Fallback Strategy
|
||||
|
||||
> **🔄 When browser-use Fails or Needs Manual Control**
|
||||
>
|
||||
> If `execute_browser_use_agent` fails, gets stuck, or you need more precise control:
|
||||
>
|
||||
> **Seamlessly fallback to manual `browser` operations:**
|
||||
> 1. `get_browser_state` - Inspect current page state and available elements
|
||||
> 2. `browser.{action}` - Perform specific action (click, input, navigate, etc.)
|
||||
> 3. `get_browser_state` - Verify result and determine next action
|
||||
> 4. Repeat this cycle until task completes
|
||||
>
|
||||
> **This is the recommended recovery pattern** - browser-use and browser are complementary tools.
|
||||
|
||||
## Choosing the Right Approach
|
||||
|
||||
| Approach | Best For | Characteristics |
|
||||
|----------|----------|-----------------|
|
||||
| **browser-use** | Complex, long tasks | Task-oriented, autonomous, describe goal + output |
|
||||
| **browser** | Precise control needed | Step-by-step, explicit actions, full control |
|
||||
| **Hybrid** | Best of both | Start with browser-use, fallback to browser if needed |
|
||||
|
||||
**Principle:** Choose based on task complexity and control needs, not step count. Both can handle multi-step workflows and form filling.
|
||||
Reference in New Issue
Block a user