Files

Krilly 57dd294675 AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning

2026-03-04 13:29:22 +00:00

5.2 KiB

Raw Permalink Blame History

name, description

name	description
browser	Use when user needs direct browser control like navigating to URLs, clicking elements, typing text, scrolling, switching tabs, taking screenshots, or inspecting page state.

Browser - Direct Browser Control

Overview

Direct control of browser interactions. Use for simple, single actions.

When to Use

browser provides low-level, precise control over browser operations:

When you need explicit control over each action
When you want to verify results at each step
As a fallback when browser-use agent fails or gets stuck
Any browser automation task (navigation, clicking, form filling, etc.)

Note: browser and browser-use are complementary. Both can accomplish the same tasks - browser gives precise step-by-step control, while browser-use provides high-level task automation.

Available Actions

Action	Description
`get_browser_state`	Get current browser state including tabs, DOM content, and highlighted screenshot
`browser.get_element_info`	Get element details by index (xpath, position, attributes, visibility)
`browser.search`	Google search
`browser.navigate`	Navigate to URL
`browser.go_back`	Go back
`browser.wait`	Wait for condition
`browser.click`	Click element
`browser.input`	Input text into field
`browser.switch`	Switch to another tab by tab_id
`browser.close`	Close a tab by tab_id
`browser.extract`	LLM extracts structured data from page markdown
`browser.scroll`	Scroll page
`browser.send_keys`	Send keyboard keys
`browser.find_text`	Scroll to and find text
`browser.dropdown_options`	Get dropdown options
`browser.select_dropdown`	Select dropdown option
`browser.evaluate`	Execute JavaScript in browser
`browser.hover`	Hover on element
`browser.download_media`	Download media from URL
`browser.get_html_content`	Get HTML content and save to file
`browser.reload_page`	Refresh current page
`browser.start_console_logging`	Start monitoring console logs (console.log/warn/error)
`browser.stop_console_logging`	Stop console logging and retrieve all logs
`browser.start_network_logging`	Start monitoring network traffic (requests/responses)
`browser.stop_network_logging`	Stop network logging and get HAR file

Key Actions

Getting Started

get_browser_state - Always call first to see current state

browser.navigate - Go to URL
browser.go_back - Go back
browser.reload_page - Refresh page

Interaction

browser.click - Click element
browser.input - Type text
browser.send_keys - Send keyboard keys
browser.hover - Hover on element

Data & Info

browser.get_element_info - Inspect element details (xpath, position, attributes)
browser.extract - LLM extracts structured data from page
browser.get_html_content - Get full HTML
browser.find_text - Find and scroll to text

Tabs

browser.switch - Switch to tab by tab_id (last 4 chars of target_id)
browser.close - Close tab by tab_id

Advanced

browser.evaluate - Execute custom JavaScript

Debugging & Testing

browser.start_console_logging - Start monitoring console logs
browser.stop_console_logging - Stop and retrieve console logs (saved to file)
browser.start_network_logging - Start monitoring network traffic
browser.stop_network_logging - Stop and retrieve network logs as HAR file

Use case: Website testing, local frontend/backend debugging, reverse engineering Workflow: Call start_* first, perform actions, then call stop_* to get logs

Relationship with browser-use

browser and browser-use are complementary, not exclusive:

browser-use: High-level, task-oriented sub-agent (describe goal, agent figures out steps)
browser: Low-level, precise control (explicit step-by-step operations)

When to prefer browser-use:

Complex tasks with long workflows
When describing the goal is easier than specifying steps

When to prefer browser:

Need precise control over each action
Want to verify intermediate results
browser-use failed or got stuck (use as fallback)

Best practice: Try browser-use for complex tasks first, fallback to browser for manual control if needed.

Best Practices

Call get_browser_state first to see available elements
Use browser.get_element_info to inspect element details (xpath, attributes, etc.)
Use element indices/IDs from browser state
Tab IDs are last 4 characters of target_id
Use browser.extract for LLM-based extraction from page markdown
Use browser.evaluate for custom JavaScript operations

Manual Control Pattern (Fallback from browser-use)

🎯 Iterative Control Loop

Use this pattern when browser-use fails or you need precise control:
1. get_browser_state    → See page state, available elements
2. browser.{action}     → Perform action (click, input, navigate...)
3. get_browser_state    → Verify result, plan next step
4. Repeat until complete
This pattern works for any task - form filling, navigation, data extraction, etc. It's the manual alternative to browser-use's autonomous approach.

5.2 KiB Raw Permalink Blame History