Sessions

The model

A Sessionis one PTY bound to one Claude Code subprocess with its own conversation memory. The daemon's Session Pool owns every Session — two Pre-warmed PTYs sit idle at the prompt and are replenished asynchronously. Active sticky Sessions accumulate without a hard cap; an idle-reaper drops Sessions untouched for 30 minutes so zombie PTYs don't leak.

Memory isolation is by construction: each Session gets its own PTY and its own claude subprocess. There is no shared state to cross-contaminate. Pin 100 parallel agents via 100 distinct X-IR-Session-IDvalues and the daemon spawns 100 isolated Sessions — bounded only by your machine's RAM.

Every /v1/messages call binds to a Session. The choice of which Session is controlled by one optional request header:

  • X-IR-Session-ID absent / empty: Daemon mints a new UUID per request → fresh Session per call → stateless.
  • X-IR-Session-ID stable string (≤128 chars): Daemon looks up that id in the Pool → sticky Session, reused across calls.

The default — no header — honors the Anthropic SDK's Stateless Contract: two consecutive client.messages.create() calls observe independent claude state, exactly as if they hit the metered API.

Stateless by default

Most SDK consumers want this. The messages[] array carries the conversation history; the daemon renders it as a single prompt to a fresh Session and disposes of the Session on return.

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "messages": [
      {"role": "user", "content": "What is 7 × 9?"},
      {"role": "assistant", "content": "63."},
      {"role": "user", "content": "Double that."}
    ]
  }'
# {"content":[{"type":"text","text":"126."}],"usage":{...}}

A second call right after sees no memory of the first — same as the Anthropic API:

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":50,
       "messages":[{"role":"user","content":"What number did we just compute?"}]}'
# Model has no context. Answer reflects that.

Pool overhead per call: ~10 ms (grab a Pre-warmed PTY). Spawn cost is amortized — the background replenisher refills the idle pool while your code runs.

Sticky Sessions

Set X-IR-Session-ID to a stable string and the daemon binds that id to a persistent Session. Subsequent calls with the same id reuse the same PTY and claude conversation memory:

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-IR-Session-ID: planner-1" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":200,
       "messages":[{"role":"user","content":"Remember: my favorite color is teal."}]}'

curl -X POST http://localhost:7421/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-IR-Session-ID: planner-1" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":50,
       "messages":[{"role":"user","content":"What did I say my favorite color was?"}]}'
# claude has memory of the prior turn → "teal"

Sticky Sessions are the right pattern for Orchestrator Processes: long-running agents that drive multi-turn conversation through a single planner loop. Choose a small set of stable identifiers (one per logical conversation) and reuse them across the orchestrator's lifetime.

What sticky Sessions are NOT for: per-message identifiers. Every new id spawns a fresh PTY (~2 s cold-spawn after the warm pool empties); leaving them resident eats RAM until the idle-reaper sweeps them 30 minutes later. Reuse a small, stable set of ids across an orchestrator's lifetime; clean up with DELETE /v1/sessions/<id>when an agent finishes if you don't want to wait for the reaper.

Many sticky Sessions in parallel

Pin N agents at once with N distinct ids. Each gets its own PTY + isolated claude conversation memory; the daemon spawns as many as you ask for. Example: a planner driving 5 workers + a critic:

import asyncio, anthropic

client = anthropic.AsyncAnthropic(api_key="unused", base_url="http://localhost:7421")

async def agent(role: str, prompt: str):
    return await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt}],
        extra_headers={"X-IR-Session-ID": role},
    )

results = await asyncio.gather(
    agent("planner-1", "..."),
    agent("worker-1", "..."), agent("worker-2", "..."),
    agent("worker-3", "..."), agent("worker-4", "..."),
    agent("worker-5", "..."),
    agent("critic-1",  "..."),
)

Seven concurrent Sessions, seven isolated PTYs, no cross-talk. Subsequent calls with the same role-id stay pinned to the same PTY and pick up the prior conversation.

Practical ceiling: a typical claude subprocess holds ~150–250 MB resident. 50 parallel agents ≈ 8–12 GB. The daemon won't stop you from going further; your kernel will.

Resetting a Sticky Session

To wipe a Sticky Session's conversation memory without losing the id:

curl -X POST http://localhost:7421/v1/sessions/planner-1/reset
# {"ok":true,"session_id":"planner-1","latency_ms":12}

The daemon drops the existing PTY (kills the claude subprocess) and binds a fresh Pre-warmed PTY to the same id. ~10 ms in the happy path. Useful between distinct conversations on the same logical agent identity.

Deleting a Sticky Session

curl -X DELETE http://localhost:7421/v1/sessions/planner-1
# {"ok":true,"session_id":"planner-1"}

Drops the PTY entirely. The next call with that id spawns a fresh Session from the Pool. Use this for cleanup at orchestrator shutdown.

Inspecting the Pool

curl http://localhost:7421/v1/sessions
# {"sessions":[],"pool":{"idle":2,"active":1,"spawning":0}}

idle = Pre-warmed PTYs ready to serve.
active = currently bound to a Sticky Session id.
spawning = background replenisher in flight.

idle and spawning are bounded by TARGET_IDLE (default 2). active is unbounded; the idle-reaper drops Sessions whose last access is older than 30 minutes (5-min sweep).

Pool sizing — when to worry

The default warm-pool size (2 idle) covers single-developer workloads comfortably. Two profiles to watch:

  • Burst rate >2 stateless calls in <2 seconds — pool empties before the background replenisher refills. Calls 3+ wait for synchronous spawn (~2 s each). For sustained high-throughput stateless workloads, batch calls or pin sticky Sessions so the daemon doesn't churn PTYs.
  • Pinning >30–50 parallel sticky agents — RAM, not the daemon, becomes the ceiling. ~200 MB per claude subprocess; 50 sessions ≈ 10 GB resident. The daemon will keep spawning; the kernel will eventually OOM-kill something. Cap your agent identity space to what your machine can hold.
  • Sticky Sessions losing context after 30 min — the idle-reaper has dropped them. Either drive them more often than every 30 min, or accept the reset and re-prime the conversation on next use.

Where to go next