Files

131 lines
5.2 KiB
Markdown

---
name: browser
description: Use when user needs direct browser control like navigating to URLs, clicking elements, typing text, scrolling, switching tabs, taking screenshots, or inspecting page state.
---
# Browser - Direct Browser Control
## Overview
Direct control of browser interactions. Use for simple, single actions.
## When to Use
**browser provides low-level, precise control over browser operations:**
- When you need explicit control over each action
- When you want to verify results at each step
- As a fallback when `browser-use` agent fails or gets stuck
- Any browser automation task (navigation, clicking, form filling, etc.)
**Note:** browser and `browser-use` are complementary. Both can accomplish the same tasks - browser gives precise step-by-step control, while browser-use provides high-level task automation.
## Available Actions
| Action | Description |
|--------|-------------|
| `get_browser_state` | Get current browser state including tabs, DOM content, and highlighted screenshot |
| `browser.get_element_info` | Get element details by index (xpath, position, attributes, visibility) |
| `browser.search` | Google search |
| `browser.navigate` | Navigate to URL |
| `browser.go_back` | Go back |
| `browser.wait` | Wait for condition |
| `browser.click` | Click element |
| `browser.input` | Input text into field |
| `browser.switch` | Switch to another tab by tab_id |
| `browser.close` | Close a tab by tab_id |
| `browser.extract` | LLM extracts structured data from page markdown |
| `browser.scroll` | Scroll page |
| `browser.send_keys` | Send keyboard keys |
| `browser.find_text` | Scroll to and find text |
| `browser.dropdown_options` | Get dropdown options |
| `browser.select_dropdown` | Select dropdown option |
| `browser.evaluate` | Execute JavaScript in browser |
| `browser.hover` | Hover on element |
| `browser.download_media` | Download media from URL |
| `browser.get_html_content` | Get HTML content and save to file |
| `browser.reload_page` | Refresh current page |
| `browser.start_console_logging` | Start monitoring console logs (console.log/warn/error) |
| `browser.stop_console_logging` | Stop console logging and retrieve all logs |
| `browser.start_network_logging` | Start monitoring network traffic (requests/responses) |
| `browser.stop_network_logging` | Stop network logging and get HAR file |
## Key Actions
### Getting Started
- `get_browser_state` - Always call first to see current state
### Navigation
- `browser.navigate` - Go to URL
- `browser.go_back` - Go back
- `browser.reload_page` - Refresh page
### Interaction
- `browser.click` - Click element
- `browser.input` - Type text
- `browser.send_keys` - Send keyboard keys
- `browser.hover` - Hover on element
### Data & Info
- `browser.get_element_info` - Inspect element details (xpath, position, attributes)
- `browser.extract` - LLM extracts structured data from page
- `browser.get_html_content` - Get full HTML
- `browser.find_text` - Find and scroll to text
### Tabs
- `browser.switch` - Switch to tab by tab_id (last 4 chars of target_id)
- `browser.close` - Close tab by tab_id
### Advanced
- `browser.evaluate` - Execute custom JavaScript
### Debugging & Testing
- `browser.start_console_logging` - Start monitoring console logs
- `browser.stop_console_logging` - Stop and retrieve console logs (saved to file)
- `browser.start_network_logging` - Start monitoring network traffic
- `browser.stop_network_logging` - Stop and retrieve network logs as HAR file
**Use case**: Website testing, local frontend/backend debugging, reverse engineering
**Workflow**: Call `start_*` first, perform actions, then call `stop_*` to get logs
## Relationship with browser-use
**browser and browser-use are complementary, not exclusive:**
- **browser-use**: High-level, task-oriented sub-agent (describe goal, agent figures out steps)
- **browser**: Low-level, precise control (explicit step-by-step operations)
**When to prefer browser-use:**
- Complex tasks with long workflows
- When describing the goal is easier than specifying steps
**When to prefer browser:**
- Need precise control over each action
- Want to verify intermediate results
- browser-use failed or got stuck (use as fallback)
**Best practice:** Try browser-use for complex tasks first, fallback to browser for manual control if needed.
## Best Practices
1. Call `get_browser_state` first to see available elements
2. Use `browser.get_element_info` to inspect element details (xpath, attributes, etc.)
3. Use element indices/IDs from browser state
4. Tab IDs are last 4 characters of target_id
5. Use `browser.extract` for LLM-based extraction from page markdown
6. Use `browser.evaluate` for custom JavaScript operations
## Manual Control Pattern (Fallback from browser-use)
> **🎯 Iterative Control Loop**
>
> Use this pattern when browser-use fails or you need precise control:
>
> ```
> 1. get_browser_state → See page state, available elements
> 2. browser.{action} → Perform action (click, input, navigate...)
> 3. get_browser_state → Verify result, plan next step
> 4. Repeat until complete
> ```
>
> **This pattern works for any task** - form filling, navigation, data extraction, etc.
> It's the manual alternative to browser-use's autonomous approach.