Source: AXI — Agent eXperience Interface

AIWeb Dev source
Kun Chen
AuthorKun Chen
Sourceaxi.md

Source: AXI — Agent eXperience Interface

Summary

This paper introduces AXI, a set of 10 design principles for building agent-ergonomic CLI tools. The core argument is that the “CLI vs MCP” debate for agent-tool interfaces is misframed — what matters is principled design, not protocol choice. AXI achieves 100% task success at the lowest cost ($0.074/task), fastest duration (21.5s), and fewest turns (4.5) across a 490-run browser automation benchmark, leading all seven conditions on every metric simultaneously. A separate 425-run GitHub benchmark validates the same pattern.

Key Claims

  • MCP schema overhead is substantial. MCP conditions average 185K input tokens per task vs. 79K for AXI — a 2.3x overhead that compounds across multi-step tasks. Exposing ~30 tool schemas inflates context regardless of how many tools the agent actually uses.
  • CLI and MCP both fail on discoverability. CLI agents must guess subcommands or read --help. MCP agents with lazy loading guess wrong tool names (e.g., selecting take_screenshot returning 1MB base64 PNG instead of take_snapshot returning 80KB text). Neither provides in-context guidance on what to do next.
  • Principled CLI design beats both raw CLI and MCP. AXI achieves MCP’s reliability advantages (structured output, discoverability) at CLI’s cost profile by applying 10 design principles organized into four categories: efficiency, robustness, discoverability, and help.
  • Specialized commands collapse multi-turn interactions. AXI’s tables --url command reduces an 11-turn, $0.194 extraction task to 2 turns at $0.047 — a 3.1x cost reduction. Combined operations (open = navigate + snapshot; fill --submit = fill + submit + wait + snapshot) eliminate the separate calls MCP conditions require.
  • Code-writing pays a coordination tax. Code-mode approaches (writing TypeScript against MCP tools) achieve 100% reliability but cost 1.6x more than AXI ($0.120 vs $0.074). The write-run-debug loop adds turns, though script batching can win on multi-site tasks.
  • MCP Compressor validates CLI-over-MCP. Atlassian’s MCP Compressor wraps any MCP server into CLI subcommands, achieving 100% success at $0.091/task — the single change from MCP to CLI eliminates schema overhead and enables shell composability. The remaining 23% gap to AXI comes from lacking combined operations.
  • TOON format saves ~40% tokens over JSON. Token-Optimized Object Notation omits braces, quotes, and commas while remaining unambiguous to LLMs.

The 10 AXI Principles

Efficiency:

  1. Token-efficient output (TOON format)
  2. Minimal default schemas (3–4 fields, not 10+)
  3. Content truncation with size hints and escape hatches

Robustness: 4. Pre-computed aggregates (total counts, CI summaries) 5. Definitive empty states (explicit “0 results”) 6. Structured errors, exit codes, idempotent mutations, no interactive prompts

Discoverability: 7. Ambient context (self-install into session hooks with dashboard) 8. Content first (no-args shows live data, not help text) 9. Contextual disclosure (next-step commands appended after output)

Help: 10. Consistent per-subcommand --help as fallback

Benchmark Results

Browser automation (490 runs, 14 tasks, 7 conditions):

ConditionSuccessAvg CostAvg Turns
chrome-devtools-axi100%$0.0744.5
dev-browser99%$0.0784.9
agent-browser99%$0.0884.8
chrome-devtools-mcp-compressed100%$0.0917.6
chrome-devtools-mcp99%$0.1006.2
chrome-devtools-mcp-code100%$0.1206.4

GitHub API (425 runs, 17 tasks, 5 conditions): AXI at 100% success, $0.050/task vs. 86%/$0.054 for CLI and 82–87%/$0.101–$0.148 for MCP. Cost gap widens dramatically on complex tasks: ci_failure_investigation costs $0.065 for AXI vs. $0.758 for MCP (12x).

Relevance and Implications

This work reframes the agent-tool interface question from protocol choice to design principles. For anyone building tools that AI Agents will consume — whether APIs, CLIs, or MCP servers — the 10 AXI principles provide a concrete, benchmarked framework. The key insight is that token budget should be treated as a first-class design constraint, on par with latency or correctness.

Sources