5.6 KiB
5.6 KiB
CODEBUDDY.md This file provides guidance to CodeBuddy Code when working with code in this repository.
Project Overview
browser-use is a Rust library for browser automation via Chrome DevTools Protocol (CDP). It provides:
- A browser session manager wrapping
headless_chrome - A tool system for common browser operations (navigate, click, input, extract, etc.)
- DOM extraction with indexed interactive elements
- An MCP (Model Context Protocol) server for AI-driven browser automation
Common Commands
Building
cargo build # Build library
cargo build --bin mcp-server # Build MCP server binary
cargo build --release # Production build
Testing
cargo test # Run unit tests only
cargo test -- --ignored # Run integration tests (requires Chrome installed)
cargo test dom_integration # Run specific test file
Running
cargo run --bin mcp-server # Run MCP server (headless)
cargo run --bin mcp-server -- --headed # Run with visible browser
Development
cargo check # Fast compile check
cargo clippy # Linting
cargo fmt # Format code
Architecture
Module Structure
The codebase is organized into five main modules:
1. browser/ - Browser Management
session.rs:BrowserSessionwrapsheadless_chrome::Browserand manages tabsconfig.rs:LaunchOptionsandConnectionOptionsfor browser initialization- Key APIs:
launch(),connect(),navigate(),extract_dom()
2. dom/ - DOM Extraction & Indexing
tree.rs:DomTreerepresents page structure with indexed interactive elementselement.rs:ElementNodeis a serializable DOM node with visibility/interactivity metadataextract_dom.js: JavaScript injected into pages to extract DOM as JSON- Flow: JS extraction → JSON →
ElementNodetree → index interactive elements →DomTree.selectors
3. tools/ - Browser Automation Tools
- Each tool is in its own file:
navigate.rs,click.rs,input.rs,extract.rs,screenshot.rs,evaluate.rs,wait.rs - All tools implement the
Tooltrait with type-safe parameter structs (e.g.,ClickParams,NavigateParams) ToolRegistrymanages tools and executes them withToolContext(containsBrowserSession+ optional cachedDomTree)- Element selection: tools accept either CSS selectors OR numeric indices (from
DomTree) - ⚠️ IMPORTANT: When adding a new tool, remember to register it in
src/mcp/mod.rsusing theregister_mcp_tools!macro
4. mcp/ - Model Context Protocol Server
handler.rs:BrowserServerwrapsBrowserSessioninArc<Mutex<>>for thread-safe MCP accessmod.rs: Usesregister_mcp_tools!macro to auto-generate MCP tool wrappers from internal tools- Runs as stdio-based MCP server via
rmcpcrate
5. error.rs - Error Handling
BrowserErrorenum with variants for launch/connection/navigation/DOM/tool failures- Converts
anyhow::Errorfromheadless_chromeandserde_json::Error
Key Design Patterns
Tool System: The Tool trait uses associated types for compile-time parameter validation:
trait Tool {
type Params: Serialize + Deserialize + JsonSchema;
fn execute_typed(&self, params: Self::Params, context: &mut ToolContext) -> Result<ToolResult>;
}
DOM Indexing: Interactive elements get numeric indices for easier LLM targeting:
- Extract DOM → Traverse tree → Detect interactive elements (buttons, links, inputs)
- Assign indices only to visible + interactive elements
- Tools can use
{"index": 5}instead of complex CSS selectors
Dual Element Selection: Tools accept both:
- CSS selector:
{"selector": "#submit-btn"} - Numeric index:
{"index": 5}(requires DOM extraction first)
MCP Integration: The register_mcp_tools! macro automatically wraps internal tools:
- Takes tool type + MCP name + description
- Generates async function that locks session, calls tool, converts result
- All registered in
tool_routerforrmcpdispatcher
Testing Approach
- Unit tests in each module for struct/enum behavior
- Integration tests in
tests/require Chrome (#[ignore]attribute) - Run ignored tests with:
cargo test -- --ignored - Tests use
data:URLs to avoid network dependencies
Important Implementation Notes
- The MCP server runs in a single-threaded Tokio runtime (
#[tokio::main(flavor = "current_thread")]) BrowserSessionholds aheadless_chrome::Browserand manages one active tab at a time- DOM extraction executes JavaScript in the browser and parses the returned JSON
- All tools work on the active tab; use
switch_tab()to change context - Element indices are only valid for the specific DOM extraction they came from
- Re-extracting the DOM rebuilds the selector list on
DomTreeand reassigns all indices - When writing JavaScript to be executed in the browser, always use
JSON.stringify()to ensure the result is returned properly - this prevents issues with complex objects and ensures consistent serialization
Crate Dependencies
headless_chrome: CDP client for Chrome/Chromium automationrmcp: Model Context Protocol (MCP) server frameworkserde/serde_json: JSON serialization for params and DOMschemars: JSON Schema generation for tool parametersthiserror: Ergonomic error definitionstokio(optional): Async runtime for MCP serverclap(optional): CLI arg parsing for MCP server binary
File Locations
- MCP server binary:
src/bin/mcp_server.rs - DOM extraction script:
src/dom/extract_dom.js(embedded viainclude_str!) - Integration tests:
tests/dom_integration.rs