Streaming — The Stream Synchronization Guide
inference-relay normalizes the streaming behavior of every supported provider into a single, predictable interface. Whether the underlying transport uses server-sent events, asynchronous stream decoding, or a native subscription gateway, your application code sees the same event surface.
Universal Protocol Decoder
The Universal Protocol Decoder is the core abstraction that makes multi-provider streaming possible. It accepts the raw output stream from any supported provider and translates it into a unified event sequence.
- Anthropic — SSE transport, normalized in real time
- Native Subscription Gateway — Asynchronous Stream Decoding, normalized in real time
- Ollama — Asynchronous Stream Decoding, normalized in real time
- OpenAI — SSE transport, normalized in real time
Regardless of which provider handles the request, your code receives the same event interface. No conditional logic per provider. No format detection. The Universal Protocol Decoder handles all of it.
Event Interface
The stream implements the async iterable protocol. Each yielded event follows the same shape regardless of the upstream provider:
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}This is the lowest-level consumption pattern. Every event is delivered as it arrives, with no buffering beyond what the Universal Protocol Decoder requires for normalization.
Event Handlers
For convenience, named event handlers provide a higher-level interface over the raw event stream:
stream.on('text', (text) => {
// Real-time text deltas
process.stdout.write(text);
});
stream.on('message', (message) => {
// Complete accumulated message (fired once)
console.log('Final:', message.content);
});
stream.on('error', (error) => {
console.error('Stream error:', error);
});- The
texthandler fires for every text delta, across all content blocks. - The
messagehandler fires exactly once, after the stream completes, with the fully accumulated message. - The
errorhandler fires on any stream-level error, including transport failures and provider errors.
These handlers and the async iterator are mutually compatible. You can attach handlers and iterate the same stream.
Final Message
If you only need the completed message and do not need to process deltas in real time, use finalMessage():
const stream = await client.messages.create({ ..., stream: true });
const finalMessage = await stream.finalMessage();
// Access: finalMessage.content, finalMessage.usage, finalMessage.stop_reasonKey Behaviors
- A 5-minute safety timeout prevents indefinite hangs. If the provider stops sending events without closing the stream, the timeout fires and the promise rejects with a timeout error.
- You must iterate the stream or attach at least one event handler for the message to populate. Calling
finalMessage()without consuming the stream will wait for the timeout. - If the stream completes normally,
finalMessage()resolves with the fully accumulated message including usage statistics.
Data Sovereignty Lock
A stream can only be iterated once. Attempting to iterate a stream a second time throws an error immediately.
This is the Data Sovereignty Lock. It guarantees that the consuming code path has exclusive, unforkable access to the stream content. The stream data cannot be intercepted, duplicated, or silently observed by other parts of the application.
Why this matters:
- Prevents accidental double-processing of streamed content
- Ensures that stream-derived state (accumulated tokens, usage counters) is consistent
- Eliminates an entire class of bugs where two consumers race over the same event sequence
If you need the stream content in multiple places, consume it once and distribute the final message.
Logic Integrity Buffer
During protocol recovery states — when the Universal Protocol Decoder detects and corrects a transport anomaly — the stream includes a Logic Integrity Buffer. This is a brief staggering period (transparent to the consumer) that ensures stream stability before normal delivery resumes.
- The buffer is invisible to application code. No special handling is needed.
- It does not affect the data integrity of the stream. Every event that would have been delivered is still delivered, in order.
- It ensures that recovery-state transitions do not produce malformed output or duplicate events.
You will never need to account for the Logic Integrity Buffer in your code. It exists purely as an internal safety mechanism within the Universal Protocol Decoder.
AbortController Support
Streams support the standard AbortController interface for cancellation:
const controller = new AbortController();
const stream = await client.messages.create({
...,
stream: true,
}, { signal: controller.signal });
// Cancel at any time:
controller.abort();Cancellation Behavior
- The
AbortErrorre-throws immediately with no cascade. Your catch block receives it cleanly. - Process cleanup happens automatically. The underlying transport connection is closed, and any Native Subscription Gateway processes are terminated.
- Aborting a stream that has already completed is a no-op.
Level 1 Compatibility
When using the auto-patch integration (import 'inference-relay/auto'), the full streaming interface described on this page is preserved exactly. Existing streaming code that works with the Anthropic SDK works unchanged with inference-relay.
Additional metadata is available on the stream object:
stream.provider— which provider actually handled this requeststream.costUsd— real-time cost tracking as tokens are consumed
These properties are additive. They do not alter the behavior of the standard streaming interface.
Content Accumulation
The Universal Protocol Decoder automatically accumulates content blocks during iteration. You do not need to manually concatenate text deltas or track content block indices.
- During iteration, each
content_block_deltaevent is applied to the internal accumulator. - When the stream completes,
finalMessage()returns the fully assembled message including all content blocks, usage statistics, and the stop reason. - No manual accumulation is needed. The stream handles it internally, regardless of provider.
This means you can freely mix real-time delta processing (via the iterator or event handlers) with final-message access, and both will reflect the complete content.
Continue reading: Instruction Protocol for the Two-Envelope architecture and IP protection model.