Changelog

inference-relay v1.1

Released: [date pending] · Headline: library → daemon. Any-language Anthropic SDK compatibility.

What's new

Standalone Rust daemon. Binds 127.0.0.1:7421. Accepts Anthropic-shape HTTP. Drop your SDK against it from Python, Node, Go, Rust, curl — any language.
Headless operation. Install once; daemon survives application restarts. Windows Task Scheduler + macOS launchd + Linux systemd recipes included in Daemon Docs / Headless.
Session Pool. Two pre-warmed PTYs idle at the prompt; ~10 ms warm-pool grab; ~2 s cold-spawn. (v1.1.10+ removed the 5-concurrent cap; see below.)
Stateless by default; sticky via X-IR-Session-ID. Honors the Anthropic SDK's stateless contract for plain /v1/messages calls; opts into sticky multi-turn when the header is set.
Tool use, vision, attachments — works unchanged. Standard Anthropic SDK shape; daemon forwards transparently.
ed25519-signed auto-updater. Daemon polls every 4h; signed bundles verified against an embedded pubkey.
POST /v1/sessions/<id>/reset. Pool-swap reset for sticky sessions; ~10 ms warm-pool grab.

Update — v1.1.9 (streaming + multi-provider)

Real SSE streaming. SDK calls with stream: true now receive incremental text_delta events at ~100 ms intervals as Claude renders, instead of a buffered burst at end-of-turn. Tool-use blocks emit post-cooldown.
Multi-provider cascade. Daemon now routes through claude-cli (priority 1) → anthropic-api (2) → openai (3) → ollama (4). Configure API keys in settings; the cascade automatically falls back on Unavailable. Per-call pin via X-IR-Provider: <id> header.
Linux installers. .AppImage, .deb, and .rpm packages now shipping. macOS Intel users get a real .dmg (no more extract+drag tar.gz). All four platforms now built via GitHub Actions CI.
First-launch hello-world test. The activation wizard now runs a tiny end-to-end test call after license validation, so you see Claude responding before being released to the dashboard. Catches the “license valid but Claude Code unauthenticated” case in-context.

Update — v1.1.10 (unlimited concurrent sessions)

No hard cap on sticky sessions. Removed the v1.1.9 LRU eviction at 5 concurrent. Devs running multi-agent topologies (planner + N workers + critic, swarm-of-50, etc.) pin arbitrary parallel sessions via X-IR-Session-ID; each gets its own isolated PTY with no cross-contamination.
Idle reaper replaces the cap. Sessions idle >30 min get GC'd (5-min sweep). Mid-task pauses (human thinking, paste, re-run) stay well under the threshold; zombie sessions get released so claude children don't accumulate.

Update — v1.1.11 (quality-of-life)

Login-shell PATH probe for claude resolution. When the static fallback list misses (asdf, mise, fnm, custom npm prefix, corp-IT PATH injection), the daemon now execs $SHELL -ilc 'which claude' to read whatever PATH your .zshrc/.zshenv configures. Should resolve ~95% of the “claude binary not found” cases that hit non-Homebrew installs. Also added ~/.local/bin, ~/.nodenv/shims, ~/.fnm/aliases/default/bin, and a walk of ~/.nvm/versions/node/v*/bin as static candidates.
Event-driven auto-update (no more 4h timer). The Tauri shell now waits on a tokio::sync::Notify signal; the daemon's /v1/messages handler fires it (debounced to ≤1/5min) after each successful call. Idle daemons stay idle; active daemons get updates within ~5 min of next call after a new release. Plus one eager check 30 s after launch so freshly-installed copies don't lag a version behind.
/v1/version reports claudePath. Was hardcoded claudeAvailable: false (TODO leftover). Now accurate + includes the resolved path so customers can diagnose where the daemon looked.
Wizard claude-not-found remediation. Failed-state Card detects “claude binary not found” patterns in the daemon error and surfaces a clean install one-liner (npm i -g @anthropic-ai/claude-code then claude) instead of the raw error string.

Update — v1.1.13 (sticky-session fix + QoL)

Sticky-session first-call regression — properly fixed. v1.1.10–v1.1.12 had a subtle bug where the first call to a fresh X-IR-Session-ID returned content: []. claude's ✻ Cogitating for Ns marker appears at the START of processing, not the end — daemon read this as “turn over” before claude wrote the response. Now: if extraction returns empty when the turn looks done, treat it as a false positive and keep polling. Verified 3-way concurrent test: 3/3 in 7s (was 0/3 on v1.1.11, 2/3 on v1.1.12).
Tray “Check for updates” — now wired. Click handler was a no-op since v1.1. Now calls the same check + auto-install as the Settings button and surfaces the result via native notification.
Auto-install on manual update check. The renderer's “Check for updates” button previously only checked and alerted “update available.” Now downloads + installs in place via the Tauri updater (ed25519-verified, app-bundle swap, restart).
Allow-tools checkbox now persists in Tauri builds. window.confirm() is silently disabled in Tauri 2; the dangerous-toggle gate was always failing closed. Replaced with a 2-click arm-then-confirm pattern + inline warning that auto-cancels after 5s.
Recent Activity prompt previews. Dashboard's Recent Activity rows now show the first ~120 chars of the user prompt instead of “(no prompt preview).”

Update — v1.1.14 (live activity log + agent inspector + doctor)

Live activity log. Dashboard now streams new calls in real time via SSE (/v1/activity/stream). No polling lag — calls appear the moment the daemon persists them. Slow consumers drop events (broadcast capacity 256); the dashboard's 4s poll reconciles via the lagged marker.
Agent Inspector. Click any Recent Activity row to open a modal showing the call ID, session ID, model, tokens, cost-avoided, error if any, full prompt, and the raw PTY transcript exactly as captured between cooldowns. Builds trust + helps SDK consumers debug their integrations.
inference-relay doctor CLI. New subcommand runs 5 diagnostic checks: claude binary resolvable, settings.json readable, license configured, daemon /v1/health reachable, update server reachable. Output: PASS/WARN/FAIL with detail. --json for machine-readable. Customers paste the output into support requests; 5-second triage.
Cost-avoided headline. Dashboard surfaces $X.XX saved vs Anthropic API rates as a hero card when usage accumulates. Value prop made visible every time you open the app.
Wizard preflight. License-activation wizard now probes /v1/version's claudeAvailable field BEFORE firing the hello-world test. If claude isn't resolvable, fails fast with the install one-liner (npm i -g @anthropic-ai/claude-code) instead of a 30-second timeout.
Backend: usage counter clamp fix. Dashboard's “Calls this cycle” was showing 0 for some customers because Stripe's currentPeriodStart can be in the future on a fresh license — filtering out every real event. Clamped to now on the relay backend; both the daemon and the dashboard now agree on the count.

Update — v1.1.15 (OpenAI Chat Completions inbound)

POST /v1/chat/completions — OpenAI shape now accepted. Apps coded against openai-python / @openai/openai-node / etc. point at IR with one line change (base_url → http://localhost:7421/v1). Translator handles request body (system / user / assistant / tool messages), response shape (choices[0].message.content + tool_calls + usage), stop_reason mapping (tool_use → tool_calls, max_tokens → length, etc.), and tools (function calling round-trip).
Streaming. OpenAI SSE shape (data: {choices: [{delta: {...}}]} chunks ending with data: [DONE]) emitted by an internal state-machine translator. Text + tool-call argument deltas stream across chunks matching the wire spec.
Same SDK guarantees as the Anthropic side. License gate, cap-exceeded, X-IR-Session-ID sticky sessions, X-IR-Provider hard-pinning all honored. Routes through the same cascade as /v1/messages.
Vision via image_url coming in v1.1.16 (next bullet). Initial v1.1.15 cut shipped text + tools; vision support followed immediately.

Update — v1.1.16 (OpenAI vision)

image_url content blocks supported on /v1/chat/completions. Pass images as data:image/png;base64,... URLs in the OpenAI-shape user message; the daemon translates to Anthropic image blocks and writes to the existing PTY attachment pipeline. Verified: tiny 1x1 PNG + “how many pixels?” → claude correctly answers “1”.
Remote http(s) image URLs not supported — by design. Would add a network fetch + SSRF surface inside the daemon for marginal benefit; nearly every openai-python example uses data: URLs anyway. Returns a clean error pointing to the base64 workaround if a remote URL is supplied.

By design — no code-signing

v1.1 ships unsigned (no Apple Developer ID; no Windows EV cert). Your OS will warn you the app is “damaged” or “unrecognized” on first launch. It isn't — it's just not stamped by Apple or Microsoft. One terminal command clears the warning. Full instructions are in the welcome email and in Quickstart §2.

We don't pay the certificate-of-trust tax. Three reasons:

Building software should be free. The web didn't require Apple's or Microsoft's permission to publish; neither should desktop software. A $99 / $299 annual gate on shipping is a tax the platform owners extract from every developer, and we won't compound it onto our customers.
The cert doesn't verify trust. Anyone with $99 can get an Apple Developer ID — including bad actors. The check is “did this person have a working credit card,” not “is this software safe.”
Zero security benefit for a localhost daemon. inference-relay binds 127.0.0.1only; it never sees the network. The signing apparatus exists to vouch for software shipped over an untrusted distribution channel — that's not us.

v1.0 (npm library) — still supported

Existing v1.0 customers are NOT being moved. The npm library continues to ship and continues to receive bugfix releases. New features land in v1.1+ first; v1.0 catches up when the change applies.

If your stack is Node-only and you want zero external processes, v1.0 remains the right answer.

Install + try

macOS — Apple silicon: inference-relay_1.1.14_aarch64.dmg
macOS — Intel: inference-relay_1.1.14_x64.dmg
Windows: inference-relay_1.1.14_x64-setup.exe
Linux: AppImage, .deb, .rpm
Quickstart: /docs/daemon/quickstart
Migration from v1.0: /docs/daemon/migration-from-v1

Acknowledgments

Strider integration (a long-running agent orchestrator) drove much of the v1.1 testing surface and surfaced the headless-operation unlock that became a first-class feature.

Questions? Email support@inference-relay.com.