株式会社オブライト
AI2026-04-05

Vercel Agent Browser Complete Guide — AI Agent Browser Automation CLI Setup & Usage [2026]

Vercel Agent Browser is a browser automation CLI tool designed specifically for AI agents. With 82.5% fewer tokens than Playwright, built in Rust for native speed, and compatible with Claude Code, Cursor, and all AI agents. Complete guide to the definitive browser automation solution.


What is Agent Browser? — Complete Overview of AI Agent-Specific Browser Automation CLI

Agent Browser is a browser automation CLI tool specifically designed for AI agents, developed by Vercel Labs. While traditional tools like Playwright and Puppeteer are designed for human developers, Agent Browser is optimized for large language model (LLM)-driven AI agents such as Claude Code, Cursor, and GitHub Copilot to directly operate. Built in Rust and released under the Apache 2.0 license, it has rapidly grown with over 25,000 GitHub stars and more than 310,000 weekly NPM downloads as of 2026. The key feature is its "accessibility ref" system (e.g., @e1, @e2) for identifying elements instead of CSS/XPath selectors. This enables AI agents to generate precise operation commands with minimal tokens, achieving an 82.5% token reduction compared to Playwright MCP. Adopting a client-daemon architecture, browser sessions persist between commands, making subsequent operations extremely fast. From v0.6.0 in January 2026 to April, over 60 version updates have been released in just three months, indicating active ongoing development.

Key Differences from Traditional Tools — Comparison with Playwright and Puppeteer

Agent Browser fundamentally differs from Playwright and Puppeteer in being "designed for AI agents." The following comparison table highlights the major differences.

AspectAgent BrowserPlaywrightPuppeteer
Primary UsersAI AgentsDevelopersDevelopers
Element SelectionAccessibility refs (@e1)CSS/XPath selectorsCSS selectors
Token Efficiency82.5% fewer than Playwright MCPStandardStandard
Supported LanguagesCLI (All languages)Node.js/Python/Java/.NETNode.js only
ArchitectureClient-Daemon (Rust)Library-basedLibrary-based
Session PersistenceAutomatic (daemon-managed)Manual management requiredManual management required
Installation Methodsnpm/Homebrew/Cargonpm/yarnnpm/yarn
Chrome ManagementAutomatic (Chrome for Testing)Manual setupChromium bundled

While Playwright and Puppeteer assume developers will write code to build test suites, Agent Browser is a CLI tool that AI agents can directly execute as shell commands. This eliminates the need for API calls or library imports, allowing AI coding assistants like Claude Code and Cursor to use it directly. The ref-based approach using accessibility trees ensures stable operations even on dynamically changing web pages and significantly reduces the token count of AI-generated prompts.

Technical Architecture — Rust Daemon and Client Separation Design

Agent Browser adopts a three-layer architecture: 1. Rust CLI (Client) — The command-line interface executed by users (or AI agents). Lightweight and fast to start. 2. Native Rust Daemon — Runs in the background and manages sessions with Chrome processes. The native Rust daemon was introduced in v0.16.0, and full Rust implementation was completed in v0.20.0. 3. Chrome DevTools Protocol — Chrome browser and daemon communicate via CDP for fast low-level operations. This client-daemon separation means that after launching a browser with `agent-browser open https://example.com`, subsequent commands like `agent-browser snapshot` and `agent-browser click @e1` do not require browser restart. The daemon maintains session state (login information, cookies, local storage, etc.), significantly accelerating authentication flows and multi-step operations. Furthermore, v0.23.0 introduced an Observability dashboard, allowing real-time monitoring of command execution history, screenshots, and network traffic via a web UI. This makes debugging and auditing AI agents much easier.

Installation and Initial Setup — Three Options: npm, Homebrew, Cargo

Installing Agent Browser is very simple. You can choose from three methods: Via npm (Recommended) ```bash npm install -g agent-browser agent-browser install ``` Homebrew (macOS/Linux) ```bash brew tap vercel/tap brew install agent-browser agent-browser install ``` Cargo (for Rust developers) ```bash cargo install agent-browser agent-browser install ``` Running `agent-browser install` automatically downloads and installs Chrome for Testing. This is an independent environment from your regular Chrome browser, so it won't affect normal browsing. For Windows, using WSL2 is recommended, though native installation via npm/cargo is also possible. On first launch, the daemon starts automatically and remains in the background. You can check daemon status with `agent-browser status`. Setting the environment variable `NO_COLOR=1` disables ANSI color codes, outputting in a machine-readable format that AI agents can parse.

Snapshot-Ref Workflow — The Core Pattern of Agent Browser

The core AI operation pattern of Agent Browser is the "Snapshot-Ref Workflow," consisting of four steps: Step 1: Open a page ```bash agent-browser open https://example.com/login ``` Step 2: Take snapshot (accessibility tree) ```bash agent-browser snapshot -i ``` This outputs an accessibility tree with all interactive elements assigned refs (e.g., @e1, @e2, @e3). AI agents parse this tree to identify operation targets. Step 3: Element operations (click, type, etc.) ```bash agent-browser click @e2 agent-browser type @e3 "username" agent-browser type @e4 "password" agent-browser click @e5 ``` Step 4: Visual confirmation (screenshot) ```bash agent-browser screenshot -o result.png ``` This workflow enables AI agents to autonomously execute the loop: "observe (snapshot) → decide (AI processing) → act (click/type) → verify (screenshot)." Particularly, snapshots with the `-i` flag require no CSS/XPath selector generation, enabling equivalent operations with approximately one-sixth the tokens compared to Playwright MCP.

Key Features Overview — Navigation, Authentication Vault, Network Control

Agent Browser's main features organized by category:

Feature CategoryCommand ExamplesDescription
Navigation`open`, `back`, `forward`, `reload`Page navigation and history operations
Interaction`click`, `type`, `press`, `drag`, `upload`Click, input, drag & drop, file upload
Screenshots`screenshot`, `screenshot --annotate`Screen capture (with annotations available)
PDF Generation`pdf -o output.pdf`Convert entire page to PDF
Network`network intercept`, `network har`Request interception, HAR recording
Authentication Vault`vault save`, `vault use`Encrypted password storage and use (not passed to LLM)
Batch Execution`batch < commands.txt`Execute multiple commands via stdin
Session Management`session save`, `session restore`Save and restore browser state
Observability`dashboard`Launch web-based monitoring dashboard (v0.23.0+)

Particularly noteworthy is the Authentication Vault feature. You can save username and password locally with encryption using `agent-browser vault save mysite`, then use them with `agent-browser vault use mysite`. This ensures raw passwords are never included in AI agent prompts, significantly reducing security risks. Additionally, the HAR (HTTP Archive) recording feature allows complete capture of network traffic for debugging and auditing purposes.

Supported AI Platforms — Ready to Use with Claude Code, Cursor, and Copilot

Agent Browser works with any AI agent capable of executing shell commands. Officially verified major platforms include: - Claude Code — Anthropic's official CLI. Executes agent-browser commands directly via the `Bash` tool - Cursor — VSCode-based AI editor. Operates via terminal - GitHub Copilot — GitHub's official AI assistant. Available in Workspace - OpenAI Codex — OpenAI API-based agents - Google Gemini CLI — Gemini model command-line interface - Goose — Open-source AI agent framework - OpenCode — Code generation-focused agent - Windsurf — Next-generation AI pair programming tool On all these platforms, AI agents simply call `agent-browser` as a Bash command. No library installation or API authentication is required—AI agents just need to be instructed to "execute agent-browser open URL and retrieve page content with snapshot," and it runs automatically. Claude Code in particular has Anthropic officially recommending Agent Browser integration, with numerous reported cases of using Agent Browser for documentation generation and test execution.

Five Practical Use Cases — From Scraping to Enterprise Automation

Five real-world applications of Agent Browser: 1. Dynamic Web Scraping Extract data from dynamically generated content (SPAs, infinite scroll, etc.) with JavaScript. Collect data that traditional curl or wget cannot retrieve, allowing AI agents to analyze it directly. 2. Automated Form Filling Complete complex forms with 30+ fields (insurance applications, government submissions, etc.) in approximately 90 seconds. Tasks taking 12+ minutes manually are accelerated by about 8x. AI agents identify fields from snapshots and automatically input appropriate values. 3. Automated Web Application Testing After creating HTML prototypes with Claude Code, immediately verify functionality with Agent Browser. AI autonomously executes complex test scenarios like "enter invalid password in login form and confirm error message appears." 4. Competitive Price Monitoring Periodically retrieve product prices from multiple e-commerce sites and track price fluctuations. Agent Browser acts as a crawler, with AI analyzing collected data and generating reports. 5. Enterprise Automation Automate web-based SaaS dashboards like Salesforce and Workday. AI agents handle report generation, approval flows, and data exports. Authentication Vault feature ensures secure credential management. In these use cases, while traditional approaches required writing scripts in Selenium or Playwright, Agent Browser allows AI agents to complete automation simply by being instructed to "check this product price daily."

Vercel Sandbox Integration — Realizing Serverless Browser Automation

Agent Browser is natively integrated with "Vercel Sandbox" provided by Vercel. Vercel Sandbox is a serverless environment that safely executes code on ephemeral Linux VMs. Using the `@vercel/sandbox` package, you can execute Agent Browser in a serverless environment like this: ```javascript import { sandbox } from '@vercel/sandbox'; const result = await sandbox({ command: 'agent-browser', args: ['open', 'https://example.com', 'snapshot', '-i'], }); ``` The greatest advantage of Vercel Sandbox is snapshot startup. By creating snapshots of Agent Browser + Chrome in a running state, cold starts can be reduced to sub-second levels. This enables high-speed browser automation execution from Vercel Functions or Edge Functions. Use case examples: - Screenshot generation at API endpoints - Dynamic OGP image generation - Automated PDF invoice generation - Cached delivery of web scraping results Vercel Sandbox offers a free tier of up to 1,000 executions per month, with unlimited execution on Pro/Enterprise plans. The combination of Agent Browser and Vercel Sandbox transforms traditionally server-resident browser automation into serverless, significantly reducing infrastructure costs.

Token Efficiency Advantage — Quantitative Comparison with Playwright MCP

Agent Browser's greatest technical advantage is the dramatic reduction in tokens consumed by AI agents. Official Vercel benchmarks demonstrate 82.5% token reduction compared to Playwright MCP (Model Context Protocol). Comparison Example: GitHub Login Flow Operations

MethodRequired TokensRelative Comparison
Playwright MCP (CSS selector specification)Approx. 12,000 tokens100%
Agent Browser (accessibility ref)Approx. 2,100 tokens17.5%

This difference arises because Playwright MCP requires AI to generate and maintain detailed CSS selectors like "button[data-test-id='login-submit']," while Agent Browser needs only short refs like "@e5." Impact on API Billing Assuming Claude 3.5 Sonnet usage (input token cost: $3 per million tokens), executing 1,000 browser operations: - Playwright MCP: 12,000 tokens × 1,000 operations = 12 million tokens → approx. $36 USD - Agent Browser: 2,100 tokens × 1,000 operations = 2.1 million tokens → approx. $6.30 USD Approximately $30 USD cost savings are possible. Furthermore, within the same context budget (e.g., 200,000 tokens), Agent Browser enables 5.7 times more test executions. For large-scale E2E test suites or 24/7 monitoring agents, this difference translates to annual cost savings of thousands to tens of thousands of dollars.

Version History and Latest Trends — Rapid Evolution from v0.6 to v0.24

Since the v0.6.0 public release in January 2026, Agent Browser has undergone over 60 version updates in just three months. Major milestones include:

VersionRelease DateKey Changes
v0.6.0January 2026Initial public release
v0.10.0Late January 2026Drag & drop support
v0.16.0Mid-February 2026Native Rust daemon introduction
v0.20.0Early March 2026Full Rust implementation completed, Node.js dependency removed
v0.23.0Late March 2026Observability dashboard added
v0.24.0April 1, 2026AWS Bedrock AgentCore support
v0.24.1April 4, 2026Stability improvements and bug fixes (latest)

Particularly, the full Rust implementation in v0.20.0 was a major turning point. This reduced startup time by approximately 40% and memory usage by about 30%. Additionally, the v0.23.0 Observability dashboard is a groundbreaking feature enabling real-time monitoring of AI agent operations, crucial for meeting audit requirements during enterprise deployment. With v0.24.0 support for AWS Bedrock AgentCore, Agent Browser can now be directly used from Amazon Bedrock AI agents (Claude on AWS, Titan, etc.), strengthening integration with the AWS ecosystem. Future roadmap includes mobile browser support (iOS Safari, Android Chrome), parallel session management, and cloud recording features.

Deployment Best Practices — Key Points for Production Operation

Recommended settings and best practices for operating Agent Browser in production environments: 1. Enable Machine-Readable Output ```bash export NO_COLOR=1 ``` Setting the environment variable `NO_COLOR=1` disables ANSI color codes, producing output format that AI agents can accurately parse. 2. Session Persistence for Maintaining Login State ```bash agent-browser session save myapp-session # Restore later agent-browser session restore myapp-session ``` Save and reuse logged-in sessions instead of executing authentication flows repeatedly, reducing execution time. 3. Strict Adherence to Snapshot-Ref Pattern Always use refs obtained from `snapshot -i` instead of directly specifying CSS/XPath selectors. Ensures stable operation even on dynamically changing web pages. 4. Error Handling and Automatic Alert Dialog Dismissal Agent Browser automatically dismisses alert dialogs, but for unexpected popups, you can explicitly close them with `agent-browser dismiss`. 5. Headless Mode and CI/CD Integration ```bash agent-browser --headless open https://example.com ``` When executing in GitHub Actions or CircleCI, use the `--headless` flag for UI-less execution. 6. Timeout and Retry Settings Set timeouts like `--timeout 30000` (30 seconds) for network and page load delays, and implement retry logic on the AI agent side for failures. 7. Security and Vault Utilization Always save passwords and sensitive information with `agent-browser vault save` for encrypted storage, ensuring raw data doesn't remain in prompts or logs. Vault data is encrypted with AES-256. These practices enable safe and efficient operation of Agent Browser in enterprise environments.

FAQ — Frequently Asked Questions and Answers

Q1: Is Agent Browser free to use? A1: Yes, it is completely free. Released under the Apache 2.0 license, unlimited commercial use is permitted. When combined with Vercel Sandbox, Vercel's pricing plans apply (free tier: up to 1,000 executions per month). Q2: Does it work on Windows? A2: Native Windows installation is possible via npm/cargo, but WSL2 is recommended. Fully supported on macOS and Linux. Q3: Is migration easy for those with Playwright experience? A3: Yes. Simply convert scripts written in Playwright into CLI commands executed by AI agents. No API rewriting needed—AI generates appropriate commands from natural language instructions. Q4: How do I manage authentication credentials for private sites? A4: Use the Authentication Vault feature. Save username and password locally with encryption using `agent-browser vault save mysite`, then use with `agent-browser vault use mysite`. Safe because raw passwords aren't passed to LLM prompts. Q5: Which AI agents can use it? A5: Any AI agent capable of executing Bash commands. Verified to work with all major AI coding assistants including Claude Code, Cursor, GitHub Copilot, Goose, and Windsurf. Q6: Can it be integrated into CI/CD pipelines? A6: Yes. It can run in headless mode (`--headless`) on GitHub Actions, CircleCI, Jenkins, etc. Docker images are also published on GitHub Container Registry for container environments. Q7: How does speed compare to Puppeteer? A7: Rust native implementation accelerates startup time by approximately 40%. The client-daemon architecture also maintains sessions efficiently, significantly improving perceived speed for continuous operations. Q8: Can screenshot quality be adjusted? A8: Yes. Quality can be specified like `agent-browser screenshot --quality 90`. Also supports `--full-page` for entire page capture and `--clip` for specific region capture.

Oflight's Browser Automation and AI Agent Implementation Support Services

Oflight provides browser automation and AI agent implementation support services utilizing Agent Browser. Support Services: - Agent Browser environment setup and initial configuration support - Migration of existing Playwright/Puppeteer scripts to Agent Browser - Workflow optimization with AI agents (Claude Code/Cursor, etc.) - Custom automation scenario design and implementation - Serverless browser automation using Vercel Sandbox - Enterprise security audit compliance - 24/7 monitoring agent construction Pricing Estimates: - Initial implementation support: $13,000 to $20,000 USD (includes environment setup and training) - Custom scenario development: $3,300 to $10,000 USD per scenario - Monthly operation and maintenance: $2,000 to $6,600 USD (includes monitoring and improvements) By leveraging Agent Browser for business automation, we have achieved approximately 120 hours of annual reduction in form input tasks and approximately 70% reduction in E2E test execution time. Companies looking to streamline business processes through AI agent-driven browser automation are encouraged to consider Oflight's AI consulting services. Learn more: Oflight's AI Consulting Services For inquiries, please contact us via the website contact form or email.

Feel free to contact us

Contact Us