MCP Server

@inference-relay/mcp is a Model Context Protocol server that turns any MCP-compatible IDE into a live operational console for your inference-relay account. 19 tools cover financial intelligence, operational health, security verification, manifest sync, and fleet management.

Once installed, ask your IDE in plain English — “show me my inference-relay savings this week” or “verify the JWS handshake” — and the model picks the right tool automatically.

Prerequisites

  • An inference-relay license key — get one at inference-relay.com/pricing
  • Node.js 18+ (for the npx install path)
  • An MCP-compatible client — Claude Code, Claude Desktop, Cursor, or any other MCP host

Setup — Claude Code (recommended)

The claude mcp add command writes the MCP entry for you, no JSON editing required. Run from your terminal:

claude mcp add inference-relay \
  --env IR_LICENSE_KEY=ir_live_xxxx \
  -- npx -y @inference-relay/mcp

Replace ir_live_xxxx with your real key. Then verify the entry:

claude mcp get inference-relay

You should see Status: ✓ Connected and an Environment block with your IR_LICENSE_KEY.

Setup — Claude Desktop / Cursor / other MCP clients

Edit your client's MCP config file. On macOS, Claude Desktop's lives at:

~/Library/Application Support/Claude/claude_desktop_config.json

Add this entry under mcpServers (create the section if it doesn't exist):

{
  "mcpServers": {
    "inference-relay": {
      "command": "npx",
      "args": ["-y", "@inference-relay/mcp"],
      "env": {
        "IR_LICENSE_KEY": "ir_live_xxxx"
      }
    }
  }
}

The -y flag is important. Without it, npxblocks waiting for “OK to install? (y/N)” on first run and the MCP stdio handshake hangs. The -y flag answers yes automatically.

Verify the install

Restart your IDE (the MCP server is loaded at session start), then ask:

List the inference-relay MCP tools you have access to.

You should see 19 tool names. If you see fewer, the MCP server isn't connecting — check the command and args in your config.

You can also verify your license key independently with the bundled CLI:

IR_LICENSE_KEY=ir_live_xxxx npx inference-relay verify

[ir] IDENTITY:   VALIDATED
[ir] TIER:       PRO
[ir] RELAY:      ACTIVE
     Claude CLI: v2.1.96
     Tier:       max
     Usage:      19 / 5,000 (0.4%)

Try these prompts

The model will pick the right tool for each prompt automatically:

  • “Show me inference-relay model pricing.”get_model_pricing (no auth, instant)
  • “Verify inference-relay privacy integrity.”verify_privacy_integrity
  • “What's my inference-relay usage cap?”monitor_usage_caps
  • “Validate the inference-relay JWS handshake.”validate_jws_handshake
  • “Show me my inference-relay fleet status.”get_fleet_status
  • “Show me inference-relay savings this month.”get_savings_summary

The 19 tools

Financial Intelligence (5)

  • get_savings_summary — Real-time P&L: relay cost vs direct API cost
  • analyze_workflow_efficiency — Per-model ROI ranking from recent events
  • get_projected_burn — Monthly forecast: with relay vs without relay
  • monitor_usage_caps — Current usage against tier cap with gauge
  • get_model_pricing — Per-model pricing table for all providers

Operational Health (5)

  • probe_provider_availability — Status grid for CLI / API / Ollama
  • get_duration_benchmarks — p50 / p95 / p99 latency per provider
  • list_fallback_events — Recent cascade events with failure reasons
  • explain_fallback_chain — Plain English narration of a fallback chain
  • probe_environment — Bloomberg-style environment diagnostic

Security & Compliance (4)

  • verify_privacy_integrity — Compile-time guarantee that audit events never carry prompt content
  • get_audit_trail — Audit entries with optional SHA-256 hash chain verification
  • scan_leak_telemetry — Runtime scan for anomalous telemetry strings
  • validate_jws_handshake — RS256 verification of /v1/validate and /v1/manifest

Manifest Sync (2)

  • check_manifest_sync — Compare local LAST_KNOWN_GOOD against remote manifest
  • simulate_cli_drift — What-if: simulate a manifest field change and report impact

Fleet Management (3)

  • rotate_relay_keys — Generate new key, revoke old (Pro / Enterprise)
  • get_fleet_status — Status table for all keys in your fleet
  • get_activity_log — Operational activity log with filtering by event type

Troubleshooting

MCP server fails to connect

The most common cause is a missing -y flag in args. Without it, npx blocks waiting for install confirmation and the MCP stdio handshake times out. Use npx -y @inference-relay/mcp.

License-gated tools say IR_LICENSE_KEY is required

The envblock in your MCP config wasn't picked up. For Claude Code, verify with claude mcp get inference-relay — the output should include an Environment section. If it doesn't, remove and re-add with the --env flag.

Tools return “401 Invalid license key”

Either the key is wrong, expired, or revoked. Run npx inference-relay verify to test the key independently of MCP, or check the dashboard at inference-relay.com/dashboard.

Want to skip the npx slowness

Pre-install the MCP package globally and use the binary directly:

npm install -g @inference-relay/mcp

Then in your MCP config:

{
  "mcpServers": {
    "inference-relay": {
      "command": "inference-relay-mcp",
      "env": {
        "IR_LICENSE_KEY": "ir_live_xxxx"
      }
    }
  }
}

Next Steps