Daemon Documentation

These docs cover inference-relay v1.1 and later — the standalone Rust daemon distribution. For the v1.0 npm library, see Library (Legacy) Docs. Your license key validates both.

The daemon binds 127.0.0.1:7421 and accepts Anthropic-shape HTTP. Point any Anthropic SDK at that address by overriding baseURL, and your calls route through the user's Claude subscription. ~10 ms warm-pool overhead. ~2 s on first cold-spawn.

Where to start

Quickstart
Install and make your first call in five minutes.
Overview
What the daemon is, how it works, why we built it.
SDK Integration
Point Python / Node / Go / Rust against localhost:7421.
Headless
Survive application restarts — Task Scheduler / launchd / systemd.
Sessions
Stateless default. Sticky via X-IR-Session-ID.
Agents Cookbook
Five recipes for orchestrators.
Tools
Caller-defined tools forwarded to Claude Code.
Attachments
Vision, PDFs, base64 payloads in the HTTP body.
API Reference
12 documented endpoints. Anthropic-shape responses.
Troubleshooting
Common failure modes and fixes.
Daemon Lifecycle
Install, start, update, uninstall.
Migrating from v1.0
How to move from the npm library.
Security
Loopback, JWS, signed auto-updates.

What v1.1 does NOT include

  • SSE streaming — stream: true is accepted but no-op. The daemon returns a buffered JSON response. (v1.2 roadmap.)
  • Multi-provider cascade — v1.1 is Claude-only. The v1.0 library still has the Claude / Anthropic API / OpenAI / Ollama cascade.
  • Apple Developer ID / Windows EV signing — launch ships Gatekeeper-warned / SmartScreen-warned.