Tappi gives AI agents control of your real browser — with your sessions, your cookies, your extensions. No screenshots. No DOM dumps. No token tax.
Screenshot-based tools burn thousands of tokens per click. DOM dumpers flood your context window with noise. Headless browsers get blocked by every major site. There had to be a better way.
Vision models squint at pixels, guess coordinates, pray they click right. 5-10K tokens per interaction.
Entire accessibility trees — 50K+ tokens of nested divs. The LLM reads a novel just to click a button.
No cookies. No sessions. Reddit, Gmail, LinkedIn — all blocked at the front door. CAPTCHAs everywhere.
Every feature exists because we hit a wall with existing tools and built the thing we needed.
Compact indexed element lists instead of screenshots or DOM dumps. The LLM reasons less and acts faster.
Reddit, Gmail, GitHub — all use shadow DOM. Tappi pierces through automatically. Accessibility trees can't.
Connects to your existing Chrome via CDP. Your sessions, cookies, extensions. No login walls. No CAPTCHAs.
Full MCP server for Claude Desktop, Cursor, Windsurf, and any MCP client. stdio + HTTP/SSE transport.
Payment forms, CAPTCHA solvers, OAuth popups — coordinate commands handle cross-origin boundaries.
6 tools (browser, files, PDF, spreadsheets, shell, cron). 7 LLM providers. Web UI with live tool visibility.
One browser. One workspace directory. No filesystem access beyond what you define. Deliberate constraints.
pip install tappi. Or double-click the .mcpb bundle for Claude Desktop. One install, all features.
Use from the command line, import as a Python library, or connect as an MCP server. Same power, three interfaces.
4 tools. 3 real-world tasks. Same model, same thinking level, same instructions. Only one went 3/3 with correct data.
| 🔹 tappi | 🔸 Browser Tool | 🔷 Playwright | 🔶 playwright-cli | |
|---|---|---|---|---|
| Success Rate | 🟢 3/3 | 🟢 3/3 | 🟡 1/3* | 🔴 1/3 |
| Total Context | 59K | 252K | 44K | 52K |
| Total Time | 4m 13s | 8m 38s | 3m 42s | 3m 36s |
| Auth Tasks | ✅ | ✅ | ❌ | ❌ |
| Bot Detection | ✅ | ✅ | ✅ | ❌ |
| Shadow DOM | ✅ | ⚠️ Workaround | N/A | N/A |
| Data Quality | ⭐ High | ⭐ High | ⚠️ Low | N/A |
| Verdict | 🏆 Best overall | Reliable but heavy | Cheap but brittle | Too limited |
*Playwright's Reddit "success" returned automod bot comments instead of actual top comments on 4/5 posts — functionally incorrect.
You (CLI / Web UI / Claude Desktop)
↓
┌──────────────────┐
│ LLM Agent │ ← Sees compact element lists, not DOM dumps
└────────┬─────────┘
│
┌────────┴─────────┐
│ Tool Calls │
├──────────────────┤
│ 🌐 Browser │ → CDP → Your Chrome (with all your sessions)
│ 📁 Files │ → Sandboxed workspace directory
│ 📄 PDF │ → Read/create PDFs
│ 📊 Spreadsheets │ → CSV/Excel (.xlsx)
│ 💻 Shell │ → Optional, workspace-only
│ ⏰ Cron │ → Scheduled recurring tasks
└──────────────────┘
No middleware. No cloud. No screenshots.
Just structured data flowing between your browser and your LLM.Pick the one that fits your workflow.
Claude Desktop, Cursor, Windsurf
Built-in, 7 LLM providers
Direct browser control
Import and build
pip install tappiPython 3.10+ · Chrome or Chromium · Linux, macOS, Windows
We're not going to pretend tappi is perfect. Here's what other tools do better — and why we think our tradeoffs are the right ones.
Tappi's default flow doesn't use screenshots — it indexes elements into compact lists instead. Tools like Anthropic Computer Use or OpenAI Operator are vision-first: they "see" every page and reason about visual layout, button colors, spatial relationships. Tappi can take screenshots (tappi screenshot), but it's a fallback, not the primary interaction mode.
The tradeoff: Vision costs 5-10K tokens per screenshot. Tappi's element indexing costs ~200 tokens and is more reliable — the LLM never misclicks because it "saw" the button 3 pixels off. For 95% of browser tasks, you don't need to see the page — you need to interact with it. And when you do need a visual?tappi screenshot is one command away.
Playwright spins up a fresh browser anywhere — CI pipelines, Docker containers, serverless. Tappi needs Chrome running with a debug port. You can't run it in a GitHub Action without extra setup.
The tradeoff: That "limitation" is the feature. The running Chrome IS the point — it has your sessions, your cookies, your service workers, your extensions. A fresh headless Chromium is like an amnesiac trying to do your job. For testing and CI, use Playwright. For real-world agent work, use tappi.
Playwright has Microsoft behind it, thousands of contributors, massive documentation, integrations with every CI system. Tappi is one developer and a growing community. Fewer Stack Overflow answers. Fewer tutorials.
The tradeoff: Every tool starts small. Playwright isn't designed for AI agents — it was built for testing. We're purpose-built for one thing: giving AI agents browser control that actually works. Smaller but sharper. And the code is simple enough to read in an afternoon.
Playwright supports Python, Node.js, .NET, and Java. Tappi is Python-only. If your stack is TypeScript/Node, you'll need to shell out or use the MCP server.
The tradeoff: Python is the lingua franca of AI/ML. Every major AI framework is Python-first. And the CLI + MCP server work from any language — your TypeScript agent can call tappi via MCP without writing a line of Python. Language boundaries dissolve when you speak protocol.
Tappi is not a Playwright replacement. It's a different tool for a different job. Playwright tests websites. Tappi uses them.
The introduction. Why existing AI browser tools waste tokens, and how tappi's indexed element approach changes the game.
The benchmark. 4 tools, 3 tasks, same model. Reddit extraction, Google Maps scraping, Gmail automation. Only one went 3/3.
The MCP server. 24 browser tools for Claude Desktop, Cursor, Windsurf. Zero-config .mcpb bundle. pip install tappi.