Security Architecture — The RSA Fortress

inference-relay is designed around a single axiom: the relay never sees what you say to AI, and it never sees what AI says back. This is not a policy decision enforced by access controls — it is a structural guarantee enforced by the TypeScript compiler before any code executes.

1. Reverse Engineering Resistance

A common concern from developers: “If the relay runs on my user's machine, can they reverse engineer my app?”

No. Your users can inspect network traffic and processes on their own machine, but they cannot see your system prompts, your tool definitions, or any of your application logic. The relay is a dumb pipe for inference — your secret sauce stays on your server.

What Users Can See

The relay's API endpoints and request/response metadata
The embedded RSA public key (which is public by design — see Section 5)
Operational telemetry fields (provider, model, token counts — see Section 8)

What Users Cannot See

System prompts — Never captured, never transmitted, never logged
Tool definitions — Your function schemas stay server-side
Routing configuration — Your cascade rules, tier logic, and business logic never leave your backend
Other users' data — License keys scope to the holder's own audit trail, usage, and fleet
Server-side signing authority — The private key that signs JWS responses is air-gapped on secure infrastructure

Your users can see how the relay works — the mechanism is transparent. They cannot see your configuration, your usage, or your server-side logic. Like seeing a lock through a glass door — you understand how the pins and tumblers work, but you still cannot open it without the key.

2. Deterministic Privacy — The Dumb Pipe Guarantee

The relay's audit subsystem defines prompt and completion content fields as TypeScript literal false types:

interface AuditEvent {
  promptContent: false;      // literal type — not boolean, not string
  completionContent: false;  // compile error if you assign anything else
}

This is not boolean. It is not string | null. It is the literal value false. Any attempt to assign prompt text, completion text, or any truthy value to these fields results in a compile-time error — the code cannot be built, let alone shipped.

Why this matters: Most privacy guarantees are runtime assertions. They can be bypassed by a misconfigured flag, an overlooked code path, or a well-intentioned logging statement. Deterministic Privacy operates at a fundamentally different level. The TypeScript compiler acts as a mathematical proofthat content transmission is impossible within the relay's audit surface.

CI Enforcement: A Binary String Entropy Scan runs against compiled output on every build, verifying that no high-entropy string content (which would indicate embedded prompt or completion data) exists in the production artifact.

3. Volatile Memory Scoping — Credential Isolation

Credentials exist only as transient environment states within the isolated execution context and are purged upon task completion. At no point are credentials written to disk, logged, or transmitted by the relay.

Platform Credential Matrix

macOS — Hardware-Authorized Secure Enclave via OS-mediated consent prompt
Linux — libsecret system keyring with credential-file fallback
Windows — PasswordVault via Credential Manager
Electron — safeStorage with OS-level encryption
Browser — Web Crypto AES-GCM with per-origin isolation

Shell Injection Prevention

Credential retrieval uses parameterized binary invocation — never shell-mediated execution. This is a critical distinction:

Shell-mediated execution subjects arguments to shell interpolation. A credential containing $(rm -rf /) would be evaluated.
Parameterized invocation passes arguments as discrete array elements directly to the OS process API. No shell is involved. No interpolation occurs.

This eliminates an entire class of injection vectors at the system call level.

4. Process Sandboxing — Native Subscription Gateway Isolation

The Native Subscription Gateway runs as a fully isolated execution context. Communication occurs exclusively through discrete I/O channels — there is no shared memory between the relay and the gateway process.

Isolation Properties

Parameterized invocation — Arguments passed as typed array elements, never concatenated into a shell string
Discrete I/O channels — Input, output, and diagnostic channels are independent and isolated with no cross-contamination
Lifecycle tracking — All gateway processes are tracked for deterministic cleanup
Signal management — Termination signals are issued on shutdown, with platform-specific handling to prevent zombie processes
Asynchronous Stream Decoding — Gateway output is decoded as a structured stream, with each event boundary validated before processing

Zombie Prevention

The relay maintains an active process registry. On shutdown — whether clean exit, uncaught exception, or signal interrupt — every registered process receives a termination signal. The registry is cleared atomically. No gateway process survives the parent.

5. Signed Trust Chain — Logic Synchronization Security (RS256)

The relay's configuration and entitlement system uses RSA-2048 asymmetric signatures to establish a tamper-proof trust chain between server and client.

Asymmetric Decoupling

Server — Signs responses with an RSA-2048 private key, maintained on air-gapped secure infrastructure
Client — Verifies signatures with an embedded public key. The public key can only verify; it cannot sign.

This means: compromise of any client installation — or even every client installation — cannot forge a server response. An attacker with full access to the client binary can read the public key, but reading a verification key grants zero signing capability. This is the fundamental property of asymmetric cryptography, applied here as an architectural guarantee.

Security State Machine

The relay maintains a security state that degrades gracefully under adverse conditions:

SEC_NOMINAL — Signature verified, configuration fresh. Normal operation.
SEC_CACHE — Verification unsuccessful, cached configuration valid. Operates on last-known-good signed state.
SEC_RECOVERY — Attempting to re-establish signed state. Retrying verification with backoff.
SEC_DEGRADED — Extended verification failure. Logic Synchronization Buffer active.

Logic Synchronization Buffer

In SEC_DEGRADED state, the relay introduces randomized timing decorrelation into stream processing. This serves two purposes:

Stream stability — Prevents burst-mode failures during protocol recovery from cascading into client-visible errors.
Timing decorrelation — Ensures that degraded-state behavior cannot be distinguished from nominal-state behavior through timing analysis alone.

For security auditors: this is a deliberate stability mechanism, not a defect. The jitter window is calibrated to be imperceptible to end users while providing sufficient buffer for recovery operations to complete.

6. MITM Protection

The signature verification system enforces a strict fail-closed posture for authorization decisions:

Unsigned valid:true responses are rejected. An attacker who intercepts the verification request and returns an unsigned positive response gains nothing — the client will not accept it.
Only signed JWS responses are acceptedfor positive validation. The server's RSA-2048 signature is the sole authority.
Unsigned error responses are accepted for error reporting only. This is a deliberate fail-open for diagnostics: if the server returns an unsigned error (network issue, server outage), the client can surface the error message without requiring a signature. But an unsigned error can never grant authorization.

This asymmetry — fail-open for errors, fail-closed for authorization — ensures that network-level attacks can disrupt service but cannot grant unauthorized access.

7. Prompt Visibility

A common concern: “Where do my prompts go?”

With the Native Subscription Gateway (Auto-patch)

Prompts are routed through the user's own Claude subscription. They appear in the user's Claude activity history. This is their subscription, their data, their existing relationship with Anthropic. The relay is not in the data path — it orchestrates the connection, then steps aside.

With API Providers

Prompts are sent through standard Anthropic or OpenAI API endpoints, governed by the provider's existing Data Processing Agreement (DPA). The relay adds headers and manages streaming — it does not store, log, or inspect content.

In Both Cases

inference-relay servers see zero prompt content and zero completion content.This is not a configuration option. It is the Deterministic Privacy guarantee described in Section 1, enforced at the type level before compilation.

8. Telemetry

The relay transmits operational telemetry for usage tracking and cost attribution. The telemetry payload is strictly scoped:

Fields Transmitted

Provider — Which AI service was used
Model — Which model was invoked
Input token count — Usage metering
Output token count — Usage metering
Estimated cost — Cost attribution
Duration — Performance monitoring
Fallback — Whether a backup provider was used

Fields Never Transmitted

Prompt content — Structurally excluded (literal false type)
Completion content — Structurally excluded (literal false type)
System prompts — Not captured at any layer
Tool definitions — Not captured at any layer
Function call arguments — Not captured at any layer

Failure Handling

Telemetry is fire-and-forget. If the telemetry endpoint is unreachable, the event is dropped silently. Telemetry failure neverblocks, delays, or degrades inference. The user's AI interaction completes regardless of telemetry state.

Continue reading: Enterprise Security Whitepaper for the full architectural treatment intended for security auditors.