SDK Integration

The change

- baseURL: "https://api.anthropic.com"
+ baseURL: "http://localhost:7421"

That's the entire integration. The daemon accepts the Anthropic /v1/messages HTTP shape — same request body, same response shape, same content blocks, same tools field, same image/document blocks.

Your SDK doesn't know the difference. Your model doesn't either.

Python (anthropic SDK)

from anthropic import Anthropic

client = Anthropic(
    api_key="unused",
    base_url="http://localhost:7421",
)

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Reply with: OK"}],
)
print(msg.content[0].text)

Streaming note: SSE streaming via stream: true is on the roadmap but not yet implementedin v1.1. The daemon parses the field and returns a buffered JSON response regardless. See the "What's NOT supported (today)" section below.

TypeScript / Node (@anthropic-ai/sdk)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: "unused",
  baseURL: "http://localhost:7421",
});

const msg = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Reply with: OK" }],
});
console.log(msg.content[0].text);

Go (anthropic-sdk-go)

package main

import (
    "context"
    "fmt"
    "github.com/anthropics/anthropic-sdk-go"
    "github.com/anthropics/anthropic-sdk-go/option"
)

func main() {
    client := anthropic.NewClient(
        option.WithAPIKey("unused"),
        option.WithBaseURL("http://localhost:7421"),
    )
    msg, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
        Model:     anthropic.F(anthropic.ModelClaudeSonnet4_6),
        MaxTokens: anthropic.F(int64(1024)),
        Messages: anthropic.F([]anthropic.MessageParam{
            anthropic.NewUserMessage(anthropic.NewTextBlock("Reply with: OK")),
        }),
    })
    if err != nil {
        panic(err)
    }
    fmt.Println(msg.Content[0].Text)
}

Rust (reqwest)

The daemon itself ships in Rust; dogfood example:

use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();
    let body = json!({
        "model": "claude-sonnet-4-6",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": "Reply with: OK"}]
    });
    let resp: serde_json::Value = client
        .post("http://localhost:7421/v1/messages")
        .json(&body)
        .send()
        .await?
        .json()
        .await?;
    println!("{}", resp["content"][0]["text"].as_str().unwrap_or("?"));
    Ok(())
}

curl (any language with HTTP)

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Reply with: OK"}]
  }'

What works without changes

Every field on the standard messages.create request shape passes through transparently. The messages[] array (with role plus either a string content or an array of content blocks), top-level system prompts, sampling parameters (max_tokens, temperature, top_p) — the daemon parses the Anthropic shape and forwards unchanged.

Tool use works via the daemon's bundled MCP server. Send tools[] on the request body with the standard Anthropic shape (name, description, input_schema); tool calls come back as tool_use content blocks on the response. See Tools for the round-trip protocol.

Vision and document attachments ship as base64 content blocks inline in messages[].content — same shape the Anthropic API expects. The daemon writes the payload to a tempfile and mentions it to claude via @path. No size cap at this layer. See Attachments.

Multi-turn messages[] history passes through verbatim. By default the daemon renders the full history into a single prompt for a fresh Session — preserving the stateless contract. Set X-IR-Session-ID to opt into sticky multi-turn where the daemon maintains conversation state across calls.

What requires opt-in headers

X-IR-Session-ID (optional) routes the call to a sticky Session for multi-turn continuity:

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-IR-Session-ID: planner-loop-1" \
  -d '...'

Without the header, the daemon mints a UUID per request → each call is stateless. Session id values must be ≤ 128 characters; longer values are silently ignored. See Sessions.

Streaming (v1.1.9+)

SDK calls with stream: true receive a standard Anthropic SSE event stream. Each text_deltaarrives at ~100ms intervals (the daemon's PTY poll cadence) as Claude renders the response — no longer buffered until end-of-turn. Tool-use blocks emit in a burst after the cooldown marker observes, matching Anthropic's wire protocol.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain monoids in two sentences."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

What's NOT supported (today)

  • API keys. The api_key field is accepted but unused on the claude-cli path. The daemon authenticates via the license key entered at first launch + your Claude Code subscription. Don't put a real Anthropic API key in api_key; it has no effect.
  • anthropic_version header. Daemon ignores it; the SDK and the daemon always negotiate the same Anthropic-shape internally.
  • anthropic-beta features not present in the bundled Claude Code. If your code uses a beta block, the daemon may pass it to claude and claude may not understand it.
  • metadata, service_tier request fields — accepted, no-op.

Verification

# is the daemon up?
curl http://localhost:7421/v1/health

# is the license valid?
curl http://localhost:7421/v1/license | jq .license.valid

# how's the pool?
curl http://localhost:7421/v1/sessions | jq .pool

OpenAI SDK (v1.1.15+)

The daemon also accepts the OpenAI Chat Completions wire shape at POST /v1/chat/completions. Apps coded against openai-python / @openai/openai-node / any OpenAI-compatible SDK point at IR with the same one-line change. Translation is bidirectional: request body, response shape, tool calling, and streaming all work.

from openai import OpenAI

client = OpenAI(
    api_key="unused",                      # required by SDK, ignored by daemon
    base_url="http://localhost:7421/v1",   # the only line that changes
)

resp = client.chat.completions.create(
    model="gpt-4o",                        # passed through; daemon's claude doesn't care
    messages=[{"role": "user", "content": "Reply with: OK"}],
    max_tokens=30,
)
print(resp.choices[0].message.content)

Streaming works identically:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count 1 to 5"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Tool calling works on the OpenAI function-calling shape — passed straight through to claude as Anthropic tool_use blocks.tool_calls come back on the response message; round-trip tool-role messages with tool_call_id + content for the next turn.

Vision (v1.1.16+) — image_url blocks with data:URLs work transparently. Claude reads the embedded image via the daemon's attachment writer. Remote http(s) URLs are not supported (would add an SSRF surface for marginal benefit) — fetch yourself and pass as data:image/png;base64,....

import base64
from openai import OpenAI

client = OpenAI(api_key="unused", base_url="http://localhost:7421/v1")
with open("chart.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this chart?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
        ],
    }],
    max_tokens=300,
)
print(resp.choices[0].message.content)

Not supported: legacy function_call field (use tools + tool_calls instead), audio modalities.

SDK quirks

A few notes that have surfaced in production integrations:

  • Either SDK works. Anthropic SDK against /v1/messages OR OpenAI SDK against /v1/chat/completions. Same daemon, same subscription, two wire shapes. Pick whichever your app is already written for.
  • The Anthropic SDK retries 429s by default. The daemon doesn't emit 429s (no rate limit at this layer — your subscription pool is the limit). Disable SDK retries if you want explicit error handling.
  • Long contexts work — the daemon bracketed-paste-routes prompts through the PTY, no argv length cap. Verified with the Bitcoin whitepaper (21KB).

Where to go next