Sessions
The model
A Sessionis one PTY bound to one Claude Code subprocess with its own conversation memory. The daemon's Session Pool owns every Session — two Pre-warmed PTYs sit idle at the prompt and are replenished asynchronously. Active sticky Sessions accumulate without a hard cap; an idle-reaper drops Sessions untouched for 30 minutes so zombie PTYs don't leak.
Memory isolation is by construction: each Session gets its own PTY and its own claude subprocess. There is no shared state to cross-contaminate. Pin 100 parallel agents via 100 distinct X-IR-Session-IDvalues and the daemon spawns 100 isolated Sessions — bounded only by your machine's RAM.
Every /v1/messages call binds to a Session. The choice of which Session is controlled by one optional request header:
X-IR-Session-IDabsent / empty: Daemon mints a new UUID per request → fresh Session per call → stateless.X-IR-Session-IDstable string (≤128 chars): Daemon looks up that id in the Pool → sticky Session, reused across calls.
The default — no header — honors the Anthropic SDK's Stateless Contract: two consecutive client.messages.create() calls observe independent claude state, exactly as if they hit the metered API.
Stateless by default
Most SDK consumers want this. The messages[] array carries the conversation history; the daemon renders it as a single prompt to a fresh Session and disposes of the Session on return.
curl -X POST http://localhost:7421/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 200,
"messages": [
{"role": "user", "content": "What is 7 × 9?"},
{"role": "assistant", "content": "63."},
{"role": "user", "content": "Double that."}
]
}'
# {"content":[{"type":"text","text":"126."}],"usage":{...}}A second call right after sees no memory of the first — same as the Anthropic API:
curl -X POST http://localhost:7421/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-6","max_tokens":50,
"messages":[{"role":"user","content":"What number did we just compute?"}]}'
# Model has no context. Answer reflects that.Pool overhead per call: ~10 ms (grab a Pre-warmed PTY). Spawn cost is amortized — the background replenisher refills the idle pool while your code runs.
Sticky Sessions
Set X-IR-Session-ID to a stable string and the daemon binds that id to a persistent Session. Subsequent calls with the same id reuse the same PTY and claude conversation memory:
curl -X POST http://localhost:7421/v1/messages \
-H "Content-Type: application/json" \
-H "X-IR-Session-ID: planner-1" \
-d '{"model":"claude-sonnet-4-6","max_tokens":200,
"messages":[{"role":"user","content":"Remember: my favorite color is teal."}]}'
curl -X POST http://localhost:7421/v1/messages \
-H "Content-Type: application/json" \
-H "X-IR-Session-ID: planner-1" \
-d '{"model":"claude-sonnet-4-6","max_tokens":50,
"messages":[{"role":"user","content":"What did I say my favorite color was?"}]}'
# claude has memory of the prior turn → "teal"Sticky Sessions are the right pattern for Orchestrator Processes: long-running agents that drive multi-turn conversation through a single planner loop. Choose a small set of stable identifiers (one per logical conversation) and reuse them across the orchestrator's lifetime.
What sticky Sessions are NOT for: per-message identifiers. Every new id spawns a fresh PTY (~2 s cold-spawn after the warm pool empties); leaving them resident eats RAM until the idle-reaper sweeps them 30 minutes later. Reuse a small, stable set of ids across an orchestrator's lifetime; clean up with DELETE /v1/sessions/<id>when an agent finishes if you don't want to wait for the reaper.
Many sticky Sessions in parallel
Pin N agents at once with N distinct ids. Each gets its own PTY + isolated claude conversation memory; the daemon spawns as many as you ask for. Example: a planner driving 5 workers + a critic:
import asyncio, anthropic
client = anthropic.AsyncAnthropic(api_key="unused", base_url="http://localhost:7421")
async def agent(role: str, prompt: str):
return await client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[{"role": "user", "content": prompt}],
extra_headers={"X-IR-Session-ID": role},
)
results = await asyncio.gather(
agent("planner-1", "..."),
agent("worker-1", "..."), agent("worker-2", "..."),
agent("worker-3", "..."), agent("worker-4", "..."),
agent("worker-5", "..."),
agent("critic-1", "..."),
)Seven concurrent Sessions, seven isolated PTYs, no cross-talk. Subsequent calls with the same role-id stay pinned to the same PTY and pick up the prior conversation.
Practical ceiling: a typical claude subprocess holds ~150–250 MB resident. 50 parallel agents ≈ 8–12 GB. The daemon won't stop you from going further; your kernel will.
Resetting a Sticky Session
To wipe a Sticky Session's conversation memory without losing the id:
curl -X POST http://localhost:7421/v1/sessions/planner-1/reset
# {"ok":true,"session_id":"planner-1","latency_ms":12}The daemon drops the existing PTY (kills the claude subprocess) and binds a fresh Pre-warmed PTY to the same id. ~10 ms in the happy path. Useful between distinct conversations on the same logical agent identity.
Deleting a Sticky Session
curl -X DELETE http://localhost:7421/v1/sessions/planner-1
# {"ok":true,"session_id":"planner-1"}Drops the PTY entirely. The next call with that id spawns a fresh Session from the Pool. Use this for cleanup at orchestrator shutdown.
Inspecting the Pool
curl http://localhost:7421/v1/sessions
# {"sessions":[],"pool":{"idle":2,"active":1,"spawning":0}}idle = Pre-warmed PTYs ready to serve.active = currently bound to a Sticky Session id.spawning = background replenisher in flight.
idle and spawning are bounded by TARGET_IDLE (default 2). active is unbounded; the idle-reaper drops Sessions whose last access is older than 30 minutes (5-min sweep).
Pool sizing — when to worry
The default warm-pool size (2 idle) covers single-developer workloads comfortably. Two profiles to watch:
- Burst rate >2 stateless calls in <2 seconds — pool empties before the background replenisher refills. Calls 3+ wait for synchronous spawn (~2 s each). For sustained high-throughput stateless workloads, batch calls or pin sticky Sessions so the daemon doesn't churn PTYs.
- Pinning >30–50 parallel sticky agents — RAM, not the daemon, becomes the ceiling. ~200 MB per claude subprocess; 50 sessions ≈ 10 GB resident. The daemon will keep spawning; the kernel will eventually OOM-kill something. Cap your agent identity space to what your machine can hold.
- Sticky Sessions losing context after 30 min — the idle-reaper has dropped them. Either drive them more often than every 30 min, or accept the reset and re-prime the conversation on next use.
Where to go next
- Build a multi-turn agent loop → Agents Cookbook
- Tool use across turns → Tools
- Daemon failure modes → Troubleshooting