SDK Integration

The change

- baseURL: "https://api.anthropic.com"
+ baseURL: "http://localhost:7421"

That's the entire integration. The daemon accepts the Anthropic /v1/messages HTTP shape — same request body, same response shape, same content blocks, same tools field, same image/document blocks.

Your SDK doesn't know the difference. Your model doesn't either.

Python (anthropic SDK)

from anthropic import Anthropic

client = Anthropic(
    api_key="unused",
    base_url="http://localhost:7421",
)

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Reply with: OK"}],
)
print(msg.content[0].text)

Streaming note: SSE streaming via stream: true is on the roadmap but not yet implementedin v1.1. The daemon parses the field and returns a buffered JSON response regardless. See the "What's NOT supported (today)" section below.

TypeScript / Node (@anthropic-ai/sdk)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "unused",
  baseURL: "http://localhost:7421",
});

const msg = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Reply with: OK" }],
});
console.log(msg.content[0].text);

Go (anthropic-sdk-go)

package main

import (
    "context"
    "fmt"
    "github.com/anthropics/anthropic-sdk-go"
    "github.com/anthropics/anthropic-sdk-go/option"
)

func main() {
    client := anthropic.NewClient(
        option.WithAPIKey("unused"),
        option.WithBaseURL("http://localhost:7421"),
    )
    msg, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
        Model:     anthropic.F(anthropic.ModelClaudeSonnet4_6),
        MaxTokens: anthropic.F(int64(1024)),
        Messages: anthropic.F([]anthropic.MessageParam{
            anthropic.NewUserMessage(anthropic.NewTextBlock("Reply with: OK")),
        }),
    })
    if err != nil {
        panic(err)
    }
    fmt.Println(msg.Content[0].Text)
}

Rust (reqwest)

The daemon itself ships in Rust; dogfood example:

use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();
    let body = json!({
        "model": "claude-sonnet-4-6",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": "Reply with: OK"}]
    });
    let resp: serde_json::Value = client
        .post("http://localhost:7421/v1/messages")
        .json(&body)
        .send()
        .await?
        .json()
        .await?;
    println!("{}", resp["content"][0]["text"].as_str().unwrap_or("?"));
    Ok(())
}

curl (any language with HTTP)

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Reply with: OK"}]
  }'

What works without changes

Every field on the standard messages.create request shape passes through transparently. The messages[] array (with role plus either a string content or an array of content blocks), top-level system prompts, sampling parameters (max_tokens, temperature, top_p) — the daemon parses the Anthropic shape and forwards unchanged.

Tool use works via the daemon's bundled MCP server. Send tools[] on the request body with the standard Anthropic shape (name, description, input_schema); tool calls come back as tool_use content blocks on the response. See Tools for the round-trip protocol.

Vision and document attachments ship as base64 content blocks inline in messages[].content — same shape the Anthropic API expects. The daemon writes the payload to a tempfile and mentions it to claude via @path. No size cap at this layer. See Attachments.

Multi-turn messages[] history passes through verbatim. By default the daemon renders the full history into a single prompt for a fresh Session — preserving the stateless contract. Set X-IR-Session-ID to opt into sticky multi-turn where the daemon maintains conversation state across calls.

What requires opt-in headers

X-IR-Session-ID (optional) routes the call to a sticky Session for multi-turn continuity:

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-IR-Session-ID: planner-loop-1" \
  -d '...'

Without the header, the daemon mints a UUID per request → each call is stateless. Session id values must be ≤ 128 characters; longer values are silently ignored. See Sessions.

Streaming (v1.1.9+)

SDK calls with stream: true receive a standard Anthropic SSE event stream. Each text_deltaarrives at ~100ms intervals (the daemon's PTY poll cadence) as Claude renders the response — no longer buffered until end-of-turn. Tool-use blocks emit in a burst after the cooldown marker observes, matching Anthropic's wire protocol.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain monoids in two sentences."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

What's NOT supported (today)

API keys. The api_key field is accepted but unused on the claude-cli path. The daemon authenticates via the license key entered at first launch + your Claude Code subscription. Don't put a real Anthropic API key in api_key; it has no effect.
anthropic_version header. Daemon ignores it; the SDK and the daemon always negotiate the same Anthropic-shape internally.
anthropic-beta features not present in the bundled Claude Code. If your code uses a beta block, the daemon may pass it to claude and claude may not understand it.
metadata, service_tier request fields — accepted, no-op.

Verification

# is the daemon up?
curl http://localhost:7421/v1/health

# is the license valid?
curl http://localhost:7421/v1/license | jq .license.valid

# how's the pool?
curl http://localhost:7421/v1/sessions | jq .pool

OpenAI SDK (v1.1.15+)

The daemon also accepts the OpenAI Chat Completions wire shape at POST /v1/chat/completions. Apps coded against openai-python / @openai/openai-node / any OpenAI-compatible SDK point at IR with the same one-line change. Translation is bidirectional: request body, response shape, tool calling, and streaming all work.

from openai import OpenAI

client = OpenAI(
    api_key="unused",                      # required by SDK, ignored by daemon
    base_url="http://localhost:7421/v1",   # the only line that changes
)

resp = client.chat.completions.create(
    model="gpt-4o",                        # passed through; daemon's claude doesn't care
    messages=[{"role": "user", "content": "Reply with: OK"}],
    max_tokens=30,
)
print(resp.choices[0].message.content)

Streaming works identically:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count 1 to 5"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Tool calling works on the OpenAI function-calling shape — passed straight through to claude as Anthropic tool_use blocks.tool_calls come back on the response message; round-trip tool-role messages with tool_call_id + content for the next turn.

Vision (v1.1.16+) — image_url blocks with data:URLs work transparently. Claude reads the embedded image via the daemon's attachment writer. Remote http(s) URLs are not supported (would add an SSRF surface for marginal benefit) — fetch yourself and pass as data:image/png;base64,....

import base64
from openai import OpenAI

client = OpenAI(api_key="unused", base_url="http://localhost:7421/v1")
with open("chart.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this chart?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
        ],
    }],
    max_tokens=300,
)
print(resp.choices[0].message.content)

Not supported: legacy function_call field (use tools + tool_calls instead), audio modalities.

SDK quirks

A few notes that have surfaced in production integrations:

Either SDK works. Anthropic SDK against /v1/messages OR OpenAI SDK against /v1/chat/completions. Same daemon, same subscription, two wire shapes. Pick whichever your app is already written for.
The Anthropic SDK retries 429s by default. The daemon doesn't emit 429s (no rate limit at this layer — your subscription pool is the limit). Disable SDK retries if you want explicit error handling.
Long contexts work — the daemon bracketed-paste-routes prompts through the PTY, no argv length cap. Verified with the Bitcoin whitepaper (21KB).

Where to go next

Pin a sticky session for multi-turn → Sessions
Build an agent orchestrator → Agents Cookbook
Tool use → Tools
Vision / PDF / text attachments → Attachments
Full HTTP surface → API Reference