agent-browser
agent-browser: CLI Browser Automation
Section titled “agent-browser: CLI Browser Automation”Vercel’s headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
Setup Check
Section titled “Setup Check”# Check installationcommand -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"Install if needed
Section titled “Install if needed”npm install -g agent-browseragent-browser install # Downloads ChromiumCore Workflow
Section titled “Core Workflow”The snapshot + ref pattern is optimal for LLMs:
- Navigate to URL
- Snapshot to get interactive elements with refs
- Interact using refs (@e1, @e2, etc.)
- Re-snapshot after navigation or DOM changes
# Step 1: Open URLagent-browser open https://example.com
# Step 2: Get interactive elements with refsagent-browser snapshot -i --json
# Step 3: Interact using refsagent-browser click @e1agent-browser fill @e2 "search query"
# Step 4: Re-snapshot after changesagent-browser snapshot -iKey Commands
Section titled “Key Commands”Navigation
Section titled “Navigation”agent-browser open <url> # Navigate to URLagent-browser back # Go backagent-browser forward # Go forwardagent-browser reload # Reload pageagent-browser close # Close browserSnapshots (Essential for AI)
Section titled “Snapshots (Essential for AI)”agent-browser snapshot # Full accessibility treeagent-browser snapshot -i # Interactive elements only (recommended)agent-browser snapshot -i --json # JSON output for parsingagent-browser snapshot -c # Compact (remove empty elements)agent-browser snapshot -d 3 # Limit depthInteractions
Section titled “Interactions”agent-browser click @e1 # Click elementagent-browser dblclick @e1 # Double-clickagent-browser fill @e1 "text" # Clear and fill inputagent-browser type @e1 "text" # Type without clearingagent-browser press Enter # Press keyagent-browser hover @e1 # Hover elementagent-browser check @e1 # Check checkboxagent-browser uncheck @e1 # Uncheck checkboxagent-browser select @e1 "option" # Select dropdown optionagent-browser scroll down 500 # Scroll (up/down/left/right)agent-browser scrollintoview @e1 # Scroll element into viewGet Information
Section titled “Get Information”agent-browser get text @e1 # Get element textagent-browser get html @e1 # Get element HTMLagent-browser get value @e1 # Get input valueagent-browser get attr href @e1 # Get attributeagent-browser get title # Get page titleagent-browser get url # Get current URLagent-browser get count "button" # Count matching elementsScreenshots & PDFs
Section titled “Screenshots & PDFs”agent-browser screenshot # Viewport screenshotagent-browser screenshot --full # Full pageagent-browser screenshot output.png # Save to fileagent-browser screenshot --full output.png # Full page to fileagent-browser pdf output.pdf # Save as PDFagent-browser wait @e1 # Wait for elementagent-browser wait 2000 # Wait millisecondsagent-browser wait "text" # Wait for text to appearSemantic Locators (Alternative to Refs)
Section titled “Semantic Locators (Alternative to Refs)”agent-browser find role button click --name "Submit"agent-browser find text "Sign up" clickagent-browser find label "Email" fill "user@example.com"agent-browser find placeholder "Search..." fill "query"Sessions (Parallel Browsers)
Section titled “Sessions (Parallel Browsers)”# Run multiple independent browser sessionsagent-browser --session browser1 open https://site1.comagent-browser --session browser2 open https://site2.com
# List active sessionsagent-browser session listExamples
Section titled “Examples”Login Flow
Section titled “Login Flow”agent-browser open https://app.example.com/loginagent-browser snapshot -i# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]agent-browser fill @e1 "user@example.com"agent-browser fill @e2 "password123"agent-browser click @e3agent-browser wait 2000agent-browser snapshot -i # Verify logged inSearch and Extract
Section titled “Search and Extract”agent-browser open https://news.ycombinator.comagent-browser snapshot -i --json# Parse JSON to find story linksagent-browser get text @e12 # Get headline textagent-browser click @e12 # Click to open storyForm Filling
Section titled “Form Filling”agent-browser open https://forms.example.comagent-browser snapshot -iagent-browser fill @e1 "John Doe"agent-browser fill @e2 "john@example.com"agent-browser select @e3 "United States"agent-browser check @e4 # Agree to termsagent-browser click @e5 # Submit buttonagent-browser screenshot confirmation.pngDebug Mode
Section titled “Debug Mode”# Run with visible browser windowagent-browser --headed open https://example.comagent-browser --headed snapshot -iagent-browser --headed click @e1JSON Output
Section titled “JSON Output”Add --json for structured output:
agent-browser snapshot -i --jsonReturns:
{ "success": true, "data": { "refs": { "e1": {"name": "Submit", "role": "button"}, "e2": {"name": "Email", "role": "textbox"} }, "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]" }}vs Playwright MCP
Section titled “vs Playwright MCP”| Feature | agent-browser (CLI) | Playwright MCP |
|---|---|---|
| Interface | Bash commands | MCP tools |
| Selection | Refs (@e1) | Refs (e1) |
| Output | Text/JSON | Tool responses |
| Parallel | Sessions | Tabs |
| Best for | Quick automation | Tool integration |
Use agent-browser when:
- You prefer Bash-based workflows
- You want simpler CLI commands
- You need quick one-off automation
Use Playwright MCP when:
- You need deep MCP tool integration
- You want tool-based responses
- You’re building complex automation