Files

Krilly 57dd294675 AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning

2026-03-04 13:29:22 +00:00

4.1 KiB

Raw Blame History

name, description

name	description
browser-use	Use when task requires multiple steps, unknown UI, form filling, or parallel automation across multiple tabs. This launches autonomous AI agents that figure out the steps themselves.

Browser-Use Agent - Autonomous Automation

Overview

Launch AI sub-agents that complete multi-step browser tasks autonomously. Most powerful VibeSurf capability.

When to Use

browser-use is a high-level, task-oriented sub-agent approach:

Complex tasks where you describe the goal and desired output, let the agent figure out the steps
Long workflows that would require many manual browser operations
Unknown or dynamic UI that needs autonomous exploration
Parallel automation across multiple tabs

Note: browser-use and browser skill are complementary, not mutually exclusive. Both can accomplish the same tasks - browser-use is higher-level automation, while browser gives you precise control.

Available Actions

Action	Description
`execute_browser_use_agent`	Execute browser-use agent tasks. Specify tab_id to work on specific tab. Each tab_id must be unique during parallel execution.

How It Works

Describe the goal, agent figures out the steps:

Navigate to URLs
Find and interact with elements
Fill forms
Extract data
Return structured results

Task-Oriented Thinking

Good task descriptions:

✅ "Fill out the registration form with these details"
✅ "Search for Python tutorials and summarize top 3"
✅ "Go to login page, authenticate, then check dashboard"

Bad task descriptions:

❌ "Click button" (too vague, use browser)
❌ "Extract prices" (use js_code instead)
❌ "Step 1: navigate, Step 2: click..." (let agent figure it out)

Working with Existing Tabs

🎯 Important: tab_id Selection

When user refers to their already-opened pages (e.g., "the current page", "from my open tabs", "the second tab"):

FIRST call get_browser_state to get all open tabs and their IDs

THEN use the correct tab_id from the response

NEVER use tab_id: null or omit it - that creates a NEW tab

Key distinction:

tab_id: "existing_id" → Work on user's existing tab

tab_id: null or omitted → Create a brand new tab

Parallel Execution

Provide multiple tasks to run agents in parallel. Each task needs a unique tab_id for parallel execution.

Best Practices

Practice	Why
Describe goal, not steps	Agent figures out navigation
Use parallel for independent tasks	Much faster
One task per agent	Clear responsibilities
Unique tab_id per task	Required for parallel execution

Common Mistakes

Mistake	Fix
Over-specifying steps	Describe goal, let agent figure it out
Using for single click	Use `browser` instead
Using for simple extraction	Use `js_code` or `crawl` instead
Duplicate tab_id in parallel	Each agent needs unique tab_id

Fallback Strategy

🔄 When browser-use Fails or Needs Manual Control

If execute_browser_use_agent fails, gets stuck, or you need more precise control:

Seamlessly fallback to manual browser operations:

get_browser_state - Inspect current page state and available elements

browser.{action} - Perform specific action (click, input, navigate, etc.)

get_browser_state - Verify result and determine next action

Repeat this cycle until task completes

This is the recommended recovery pattern - browser-use and browser are complementary tools.

Choosing the Right Approach

Approach	Best For	Characteristics
browser-use	Complex, long tasks	Task-oriented, autonomous, describe goal + output
browser	Precise control needed	Step-by-step, explicit actions, full control
Hybrid	Best of both	Start with browser-use, fallback to browser if needed

Principle: Choose based on task complexity and control needs, not step count. Both can handle multi-step workflows and form filling.

4.1 KiB Raw Blame History