inference-relay v1.1
What's new
- Standalone Rust daemon. Binds
127.0.0.1:7421. Accepts Anthropic-shape HTTP. Drop your SDK against it from Python, Node, Go, Rust, curl — any language. - Headless operation. Install once; daemon survives application restarts. Windows Task Scheduler + macOS launchd + Linux systemd recipes included in Daemon Docs / Headless.
- Session Pool. Two pre-warmed PTYs idle at the prompt; ~10 ms warm-pool grab; ~2 s cold-spawn. (v1.1.10+ removed the 5-concurrent cap; see below.)
- Stateless by default; sticky via X-IR-Session-ID. Honors the Anthropic SDK's stateless contract for plain
/v1/messagescalls; opts into sticky multi-turn when the header is set. - Tool use, vision, attachments — works unchanged. Standard Anthropic SDK shape; daemon forwards transparently.
- ed25519-signed auto-updater. Daemon polls every 4h; signed bundles verified against an embedded pubkey.
POST /v1/sessions/<id>/reset. Pool-swap reset for sticky sessions; ~10 ms warm-pool grab.
Update — v1.1.9 (streaming + multi-provider)
- Real SSE streaming. SDK calls with
stream: truenow receive incrementaltext_deltaevents at ~100 ms intervals as Claude renders, instead of a buffered burst at end-of-turn. Tool-use blocks emit post-cooldown. - Multi-provider cascade. Daemon now routes through claude-cli (priority 1) → anthropic-api (2) → openai (3) → ollama (4). Configure API keys in settings; the cascade automatically falls back on
Unavailable. Per-call pin viaX-IR-Provider: <id>header. - Linux installers. .AppImage, .deb, and .rpm packages now shipping. macOS Intel users get a real .dmg (no more extract+drag tar.gz). All four platforms now built via GitHub Actions CI.
- First-launch hello-world test. The activation wizard now runs a tiny end-to-end test call after license validation, so you see Claude responding before being released to the dashboard. Catches the “license valid but Claude Code unauthenticated” case in-context.
Update — v1.1.10 (unlimited concurrent sessions)
- No hard cap on sticky sessions. Removed the v1.1.9 LRU eviction at 5 concurrent. Devs running multi-agent topologies (planner + N workers + critic, swarm-of-50, etc.) pin arbitrary parallel sessions via
X-IR-Session-ID; each gets its own isolated PTY with no cross-contamination. - Idle reaper replaces the cap. Sessions idle >30 min get GC'd (5-min sweep). Mid-task pauses (human thinking, paste, re-run) stay well under the threshold; zombie sessions get released so claude children don't accumulate.
Update — v1.1.11 (quality-of-life)
- Login-shell PATH probe for claude resolution. When the static fallback list misses (asdf, mise, fnm, custom npm prefix, corp-IT PATH injection), the daemon now execs
$SHELL -ilc 'which claude'to read whatever PATH your.zshrc/.zshenvconfigures. Should resolve ~95% of the “claude binary not found” cases that hit non-Homebrew installs. Also added~/.local/bin,~/.nodenv/shims,~/.fnm/aliases/default/bin, and a walk of~/.nvm/versions/node/v*/binas static candidates. - Event-driven auto-update (no more 4h timer). The Tauri shell now waits on a
tokio::sync::Notifysignal; the daemon's/v1/messageshandler fires it (debounced to ≤1/5min) after each successful call. Idle daemons stay idle; active daemons get updates within ~5 min of next call after a new release. Plus one eager check 30 s after launch so freshly-installed copies don't lag a version behind. /v1/versionreportsclaudePath. Was hardcodedclaudeAvailable: false(TODO leftover). Now accurate + includes the resolved path so customers can diagnose where the daemon looked.- Wizard claude-not-found remediation. Failed-state Card detects “claude binary not found” patterns in the daemon error and surfaces a clean install one-liner (
npm i -g @anthropic-ai/claude-codethenclaude) instead of the raw error string.
Update — v1.1.13 (sticky-session fix + QoL)
- Sticky-session first-call regression — properly fixed. v1.1.10–v1.1.12 had a subtle bug where the first call to a fresh
X-IR-Session-IDreturnedcontent: []. claude's✻ Cogitating for Nsmarker appears at the START of processing, not the end — daemon read this as “turn over” before claude wrote the response. Now: if extraction returns empty when the turn looks done, treat it as a false positive and keep polling. Verified 3-way concurrent test: 3/3 in 7s (was 0/3 on v1.1.11, 2/3 on v1.1.12). - Tray “Check for updates” — now wired. Click handler was a no-op since v1.1. Now calls the same check + auto-install as the Settings button and surfaces the result via native notification.
- Auto-install on manual update check. The renderer's “Check for updates” button previously only checked and alerted “update available.” Now downloads + installs in place via the Tauri updater (ed25519-verified, app-bundle swap, restart).
- Allow-tools checkbox now persists in Tauri builds.
window.confirm()is silently disabled in Tauri 2; the dangerous-toggle gate was always failing closed. Replaced with a 2-click arm-then-confirm pattern + inline warning that auto-cancels after 5s. - Recent Activity prompt previews. Dashboard's Recent Activity rows now show the first ~120 chars of the user prompt instead of “(no prompt preview).”
Update — v1.1.14 (live activity log + agent inspector + doctor)
- Live activity log. Dashboard now streams new calls in real time via SSE (
/v1/activity/stream). No polling lag — calls appear the moment the daemon persists them. Slow consumers drop events (broadcast capacity 256); the dashboard's 4s poll reconciles via thelaggedmarker. - Agent Inspector. Click any Recent Activity row to open a modal showing the call ID, session ID, model, tokens, cost-avoided, error if any, full prompt, and the raw PTY transcript exactly as captured between cooldowns. Builds trust + helps SDK consumers debug their integrations.
inference-relay doctorCLI. New subcommand runs 5 diagnostic checks: claude binary resolvable, settings.json readable, license configured, daemon /v1/health reachable, update server reachable. Output: PASS/WARN/FAIL with detail.--jsonfor machine-readable. Customers paste the output into support requests; 5-second triage.- Cost-avoided headline. Dashboard surfaces
$X.XX saved vs Anthropic API ratesas a hero card when usage accumulates. Value prop made visible every time you open the app. - Wizard preflight. License-activation wizard now probes
/v1/version'sclaudeAvailablefield BEFORE firing the hello-world test. If claude isn't resolvable, fails fast with the install one-liner (npm i -g @anthropic-ai/claude-code) instead of a 30-second timeout. - Backend: usage counter clamp fix. Dashboard's “Calls this cycle” was showing 0 for some customers because Stripe's
currentPeriodStartcan be in the future on a fresh license — filtering out every real event. Clamped tonowon the relay backend; both the daemon and the dashboard now agree on the count.
Update — v1.1.15 (OpenAI Chat Completions inbound)
POST /v1/chat/completions— OpenAI shape now accepted. Apps coded againstopenai-python/@openai/openai-node/ etc. point at IR with one line change (base_url→http://localhost:7421/v1). Translator handles request body (system / user / assistant / tool messages), response shape (choices[0].message.content+tool_calls+usage),stop_reasonmapping (tool_use→tool_calls,max_tokens→length, etc.), and tools (function calling round-trip).- Streaming. OpenAI SSE shape (
data: {choices: [{delta: {...}}]}chunks ending withdata: [DONE]) emitted by an internal state-machine translator. Text + tool-call argument deltas stream across chunks matching the wire spec. - Same SDK guarantees as the Anthropic side. License gate, cap-exceeded,
X-IR-Session-IDsticky sessions,X-IR-Providerhard-pinning all honored. Routes through the same cascade as/v1/messages. - Vision via
image_urlcoming in v1.1.16 (next bullet). Initial v1.1.15 cut shipped text + tools; vision support followed immediately.
Update — v1.1.16 (OpenAI vision)
image_urlcontent blocks supported on/v1/chat/completions. Pass images asdata:image/png;base64,...URLs in the OpenAI-shape user message; the daemon translates to Anthropic image blocks and writes to the existing PTY attachment pipeline. Verified: tiny 1x1 PNG + “how many pixels?” → claude correctly answers “1”.- Remote http(s) image URLs not supported — by design. Would add a network fetch + SSRF surface inside the daemon for marginal benefit; nearly every openai-python example uses data: URLs anyway. Returns a clean error pointing to the base64 workaround if a remote URL is supplied.
By design — no code-signing
v1.1 ships unsigned (no Apple Developer ID; no Windows EV cert). Your OS will warn you the app is “damaged” or “unrecognized” on first launch. It isn't — it's just not stamped by Apple or Microsoft. One terminal command clears the warning. Full instructions are in the welcome email and in Quickstart §2.
We don't pay the certificate-of-trust tax. Three reasons:
- Building software should be free. The web didn't require Apple's or Microsoft's permission to publish; neither should desktop software. A $99 / $299 annual gate on shipping is a tax the platform owners extract from every developer, and we won't compound it onto our customers.
- The cert doesn't verify trust. Anyone with $99 can get an Apple Developer ID — including bad actors. The check is “did this person have a working credit card,” not “is this software safe.”
- Zero security benefit for a localhost daemon. inference-relay binds
127.0.0.1only; it never sees the network. The signing apparatus exists to vouch for software shipped over an untrusted distribution channel — that's not us.
v1.0 (npm library) — still supported
Existing v1.0 customers are NOT being moved. The npm library continues to ship and continues to receive bugfix releases. New features land in v1.1+ first; v1.0 catches up when the change applies.
If your stack is Node-only and you want zero external processes, v1.0 remains the right answer.
Install + try
- macOS — Apple silicon: inference-relay_1.1.14_aarch64.dmg
- macOS — Intel: inference-relay_1.1.14_x64.dmg
- Windows: inference-relay_1.1.14_x64-setup.exe
- Linux: AppImage, .deb, .rpm
- Quickstart: /docs/daemon/quickstart
- Migration from v1.0: /docs/daemon/migration-from-v1
Acknowledgments
Strider integration (a long-running agent orchestrator) drove much of the v1.1 testing surface and surfaced the headless-operation unlock that became a first-class feature.
Questions? Email support@inference-relay.com.