Attachments

The model

The daemon accepts any image or document content block on the /v1/messages request body in the standard Anthropic shape. Behind the scenes, the daemon base64-decodes the payload, writes it to a tempfile, and mentions the path to claude via @path syntax. claude reads the file and processes it normally.

There is no size cap at the daemon layer. The practical ceiling is your subscription's context window plus the model's vision token budget.

Vision (PNG / JPEG / GIF / WebP)

import base64
from anthropic import Anthropic

client = Anthropic(api_key="unused", base_url="http://localhost:7421")

with open("screenshot.png", "rb") as f:
    img_b64 = base64.standard_b64encode(f.read()).decode()

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What error is shown?"},
            {"type": "image",
             "source": {"type": "base64", "media_type": "image/png", "data": img_b64}},
        ],
    }],
)
print(msg.content[0].text)

Supported media types: image/png, image/jpeg, image/gif, image/webp. The daemon picks the right tempfile extension from the media_type field.

Multiple images per message work — interleave text and image blocks in the content array. Order matters; claude reads top to bottom.

Documents (PDF, text)

with open("contract.pdf", "rb") as f:
    pdf_b64 = base64.standard_b64encode(f.read()).decode()

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize the indemnification clause."},
            {"type": "document",
             "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_b64}},
        ],
    }],
)

Supported document types: application/pdf, text/plain, text/markdown. PDFs are processed by claude's native PDF reader. Plain text and markdown are read as if pasted into the prompt.

What happens to the file

The daemon writes attachments to $TMPDIR/subscription-relay-attachments/ with a UUID-prefixed name to avoid collisions. The file lives for the duration of the call, then gets unlinked when the response returns. Files leaked by aborted calls are swept on a 15-minute background schedule (files older than 1 hour get deleted).

No attachment content lands in ~/.inference-relay/recent-calls.jsonl — only the call metadata (size, mime type, hash) is logged. The raw bytes never persist past the call.

Size and performance notes

Base64 inflates payloads by ~33%. A 4 MB image encodes to ~5.4 MB on the wire. The daemon's HTTP body limit is uncapped (default Axum limits are disabled in the daemon config), so you won't hit a 2 MB client-side cap that some proxies impose.

Latency penalty per MB of attachment is roughly:

  • ~30 ms write to tempfile
  • ~200 ms - 2 s claude's vision/document tokenization (varies by file type and size)

For multi-image batches, the per-image overhead is additive; nothing parallelizes across attachments in a single call.

Multi-turn with attachments

Sticky Sessions preserve attached files across turns the way the Anthropic SDK does — by including them in the messages[] history. Pass the same image / document block on the relevant historical turn and claude sees them again.

import httpx

session_id = "doc-review-1"
with open("contract.pdf", "rb") as f:
    pdf_b64 = base64.standard_b64encode(f.read()).decode()

doc_block = {"type": "document",
             "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_b64}}

# Turn 1
r1 = httpx.post("http://localhost:7421/v1/messages",
                headers={"X-IR-Session-ID": session_id},
                json={"model":"claude-sonnet-4-6","max_tokens":1024,
                      "messages":[{"role":"user","content":[
                          {"type":"text","text":"What's the term length?"},
                          doc_block]}]})

# Turn 2 — referenced via messages[] history
r2 = httpx.post("http://localhost:7421/v1/messages",
                headers={"X-IR-Session-ID": session_id},
                json={"model":"claude-sonnet-4-6","max_tokens":1024,
                      "messages":[
                          {"role":"user","content":[
                              {"type":"text","text":"What's the term length?"},
                              doc_block]},
                          {"role":"assistant","content":r1.json()["content"]},
                          {"role":"user","content":"What about renewal?"},
                      ]})

For agents that want to avoid re-uploading the same document on every turn, batch all questions about the doc into a single call's messages[]array, or use a sticky Session and let claude's conversation memory carry context.

What's NOT supported

  • URL fetchingsource.type: "url" is not implemented. Base64 inline payloads only.
  • Streaming uploads — the daemon reads the request body fully before processing. For very large documents (>50 MB), latency to first byte will reflect the upload time.

Where to go next