inference-relay
Docs
Whitepaper
Pricing
Sign In
One library. Pennies for orchestration.
Choose the plan that fits your infrastructure.
PATENT PENDING
Solo
$50
/mo
Pro
$100
/mo
Enterprise
Custom
Provider cascade
✓
✓
✓
Streaming
✓
✓
✓
Auto-patch integration
✓
✓
✓
Usage analytics
✓
✓
✓
Monthly call limit
3,000
15,000
Custom
Warm process pool
—
✓
✓
Advanced routing DSL
—
✓
✓
Tamper-evident audit trail
—
✓
✓
Fleet policy (MDM-signed)
—
—
✓
Org-wide key management
—
—
✓
SSO/SCIM
—
—
Planned
NDA code audit
—
—
✓
Dedicated support
—
—
✓
Get Solo
Get Pro
Contact Us
Solo
$50
/mo
Provider cascade
✓
Streaming
✓
Auto-patch integration
✓
Usage analytics
✓
Monthly call limit
3,000
Get Solo
Pro
$100
/mo
Provider cascade
✓
Streaming
✓
Auto-patch integration
✓
Usage analytics
✓
Monthly call limit
15,000
Warm process pool
✓
Advanced routing DSL
✓
Tamper-evident audit trail
✓
Get Pro
Enterprise
Custom
Provider cascade
✓
Streaming
✓
Auto-patch integration
✓
Usage analytics
✓
Monthly call limit
Custom
Warm process pool
✓
Advanced routing DSL
✓
Tamper-evident audit trail
✓
Fleet policy (MDM-signed)
✓
Org-wide key management
✓
SSO/SCIM
Planned
NDA code audit
✓
Dedicated support
✓
Contact Us
Economic Impact Analysis
Monthly API Spend
$
$0
$100
$1K
$10K
$50K
Current Monthly Cost
$1,200
Relay Monthly Cost
$20.4
Monthly Savings
$1,179.6
Current Gross Margin
15%
Relay Gross Margin
98.3%
Projected Annual Savings
$14,155.2
Before
$1,200
After
$20
Frequently Asked Questions
Expand All
Getting Started
How do I install it?
▶
How do I get a license key?
▶
How do I see my usage?
▶
Where are the docs?
▶
Where do I report a bug?
▶
Where's the Discord?
▶
Can my end users reverse engineer my app through the relay?
▶
If the relay code is visible, how are my secrets protected?
▶
Pricing & Business Model
Why is it paid?
▶
Why not free with paid features?
▶
What happens if I install without a license key?
▶
Is there a usage cap? What happens if I hit it?
▶
Why $50/mo and not $5? Isn't that too much for a library?
▶
Is there an open-source / non-commercial tier?
▶
Do you offer self-hosting?
▶
What about non-profit / academic use?
▶
Can I get a refund?
▶
Can I upgrade mid-cycle?
▶
What are top-up packs?
▶
Can I buy multiple Solo licenses to get more capacity?
▶
Use Cases & Positioning
Who is this for?
▶
What's the killer use case?
▶
Is this for chatbots?
▶
Is this for code review tools?
▶
Will my users notice?
▶
What if my users don't have a Claude subscription?
▶
Can I use this for a B2C app?
▶
Can I use this for a free app I distribute?
▶
Technical / How It Works
How is this different from LiteLLM?
▶
How is this different from running `claude --print` myself?
▶
Does it work with the OpenAI SDK?
▶
Does it work with LangChain?
▶
Does it work with the Vercel AI SDK?
▶
Does streaming work?
▶
Does it work with multimodal (images, PDFs)?
▶
What about tools / function calling?
▶
How much overhead does the CLI subscription path add?
▶
What's the cold start latency?
▶
Will this work in Docker / containers?
▶
Does it work on Windows / Linux / macOS?
▶
Does it work in the browser?
▶
What about Bun, Deno, edge runtimes?
▶
How does it handle rate limits?
▶
What if the CLI subprocess crashes?
▶
Why TypeScript and not Python?
▶
Is the source open?
▶
Comparison
Why not just use the Anthropic API directly?
▶
Why not use OpenRouter / Together / a cheaper model provider?
▶
Why not LiteLLM?
▶
Why not just have users provide API keys (BYOK)?
▶
Why not use the Anthropic Bedrock or Vertex paths?
▶
Why not OpenClaw / Claude Code Reverse / [other tools]?
▶
Privacy & Security
The "we never see your prompts" claim — how do I verify it?
▶
What metadata do you collect?
▶
Can I run inference-relay without telemetry?
▶
Can I run inference-relay completely air-gapped?
▶
Do you read user credentials from the keychain?
▶
What happens if my user denies the keychain prompt?
▶
Is the relay traffic encrypted?
▶
Have you been audited?
▶
What's your data retention policy?
▶
Does the library make outbound network calls I can't see?
▶
Operational
What if your license validation backend is down?
▶
What happens if Claude Code CLI is updated?
▶
What happens if my Claude subscription hits its rate limit?
▶
Does my user need to keep Claude Code running?
▶
Can I disable the relay path for specific users?
▶
How do I debug a fallback?
▶
Can I use the dashboard standalone?
▶
What's the MCP server?
▶
How do I manage keys across a team?
▶
What counts as a call?
▶
Compliance, Legal, Anthropic
Won't Anthropic ban this?
▶
How is this different from OpenClaw (the tool Anthropic blocked)?
▶
The April 4, 2026 Anthropic email said enforcement applies to "all third-party harnesses." Doesn't that include you?
▶
Are you reselling Claude access?
▶
What's your relationship with Anthropic?
▶
Why didn't you ask Anthropic for permission first?
▶
What if Anthropic changes its terms tomorrow?
▶
What if Claude CLI changes its NDJSON output format?
▶
Has Anthropic responded to inference-relay publicly?
▶
What if a user violates Anthropic's AUP through your library?
▶
Patent / IP
What's the IP situation?
▶
Can I implement my own version?
▶
Start relaying in 60 seconds.
Pick a Plan
Read the Docs →