Source: AXI — Agent eXperience Interface
Source: AXI — Agent eXperience Interface
Summary
This paper introduces AXI, a set of 10 design principles for building agent-ergonomic CLI tools. The core argument is that the “CLI vs MCP” debate for agent-tool interfaces is misframed — what matters is principled design, not protocol choice. AXI achieves 100% task success at the lowest cost ($0.074/task), fastest duration (21.5s), and fewest turns (4.5) across a 490-run browser automation benchmark, leading all seven conditions on every metric simultaneously. A separate 425-run GitHub benchmark validates the same pattern.
Key Claims
- MCP schema overhead is substantial. MCP conditions average 185K input tokens per task vs. 79K for AXI — a 2.3x overhead that compounds across multi-step tasks. Exposing ~30 tool schemas inflates context regardless of how many tools the agent actually uses.
- CLI and MCP both fail on discoverability. CLI agents must guess subcommands or read
--help. MCP agents with lazy loading guess wrong tool names (e.g., selectingtake_screenshotreturning 1MB base64 PNG instead oftake_snapshotreturning 80KB text). Neither provides in-context guidance on what to do next. - Principled CLI design beats both raw CLI and MCP. AXI achieves MCP’s reliability advantages (structured output, discoverability) at CLI’s cost profile by applying 10 design principles organized into four categories: efficiency, robustness, discoverability, and help.
- Specialized commands collapse multi-turn interactions. AXI’s
tables --urlcommand reduces an 11-turn, $0.194 extraction task to 2 turns at $0.047 — a 3.1x cost reduction. Combined operations (open= navigate + snapshot;fill --submit= fill + submit + wait + snapshot) eliminate the separate calls MCP conditions require. - Code-writing pays a coordination tax. Code-mode approaches (writing TypeScript against MCP tools) achieve 100% reliability but cost 1.6x more than AXI ($0.120 vs $0.074). The write-run-debug loop adds turns, though script batching can win on multi-site tasks.
- MCP Compressor validates CLI-over-MCP. Atlassian’s MCP Compressor wraps any MCP server into CLI subcommands, achieving 100% success at $0.091/task — the single change from MCP to CLI eliminates schema overhead and enables shell composability. The remaining 23% gap to AXI comes from lacking combined operations.
- TOON format saves ~40% tokens over JSON. Token-Optimized Object Notation omits braces, quotes, and commas while remaining unambiguous to LLMs.
The 10 AXI Principles
Efficiency:
- Token-efficient output (TOON format)
- Minimal default schemas (3–4 fields, not 10+)
- Content truncation with size hints and escape hatches
Robustness: 4. Pre-computed aggregates (total counts, CI summaries) 5. Definitive empty states (explicit “0 results”) 6. Structured errors, exit codes, idempotent mutations, no interactive prompts
Discoverability: 7. Ambient context (self-install into session hooks with dashboard) 8. Content first (no-args shows live data, not help text) 9. Contextual disclosure (next-step commands appended after output)
Help:
10. Consistent per-subcommand --help as fallback
Benchmark Results
Browser automation (490 runs, 14 tasks, 7 conditions):
| Condition | Success | Avg Cost | Avg Turns |
|---|---|---|---|
| chrome-devtools-axi | 100% | $0.074 | 4.5 |
| dev-browser | 99% | $0.078 | 4.9 |
| agent-browser | 99% | $0.088 | 4.8 |
| chrome-devtools-mcp-compressed | 100% | $0.091 | 7.6 |
| chrome-devtools-mcp | 99% | $0.100 | 6.2 |
| chrome-devtools-mcp-code | 100% | $0.120 | 6.4 |
GitHub API (425 runs, 17 tasks, 5 conditions): AXI at 100% success, $0.050/task vs. 86%/$0.054 for CLI and 82–87%/$0.101–$0.148 for MCP. Cost gap widens dramatically on complex tasks: ci_failure_investigation costs $0.065 for AXI vs. $0.758 for MCP (12x).
Relevance and Implications
This work reframes the agent-tool interface question from protocol choice to design principles. For anyone building tools that AI Agents will consume — whether APIs, CLIs, or MCP servers — the 10 AXI principles provide a concrete, benchmarked framework. The key insight is that token budget should be treated as a first-class design constraint, on par with latency or correctness.