Security Architecture — The RSA Fortress
inference-relay is designed around a single axiom: the relay never sees what you say to AI, and it never sees what AI says back. This is not a policy decision enforced by access controls — it is a structural guarantee enforced by the TypeScript compiler before any code executes.
1. Reverse Engineering Resistance
A common concern from developers: “If the relay runs on my user's machine, can they reverse engineer my app?”
No. Your users can inspect network traffic and processes on their own machine, but they cannot see your system prompts, your tool definitions, or any of your application logic. The relay is a dumb pipe for inference — your secret sauce stays on your server.
What Users Can See
- The relay's API endpoints and request/response metadata
- The embedded RSA public key (which is public by design — see Section 5)
- Operational telemetry fields (provider, model, token counts — see Section 8)
What Users Cannot See
- System prompts — Never captured, never transmitted, never logged
- Tool definitions — Your function schemas stay server-side
- Routing configuration — Your cascade rules, tier logic, and business logic never leave your backend
- Other users' data — License keys scope to the holder's own audit trail, usage, and fleet
- Server-side signing authority — The private key that signs JWS responses is air-gapped on secure infrastructure
Your users can see how the relay works — the mechanism is transparent. They cannot see your configuration, your usage, or your server-side logic. Like seeing a lock through a glass door — you understand how the pins and tumblers work, but you still cannot open it without the key.
2. Deterministic Privacy — The Dumb Pipe Guarantee
The relay's audit subsystem defines prompt and completion content fields as TypeScript literal false types:
interface AuditEvent {
promptContent: false; // literal type — not boolean, not string
completionContent: false; // compile error if you assign anything else
}This is not boolean. It is not string | null. It is the literal value false. Any attempt to assign prompt text, completion text, or any truthy value to these fields results in a compile-time error — the code cannot be built, let alone shipped.
Why this matters: Most privacy guarantees are runtime assertions. They can be bypassed by a misconfigured flag, an overlooked code path, or a well-intentioned logging statement. Deterministic Privacy operates at a fundamentally different level. The TypeScript compiler acts as a mathematical proofthat content transmission is impossible within the relay's audit surface.
CI Enforcement: A Binary String Entropy Scan runs against compiled output on every build, verifying that no high-entropy string content (which would indicate embedded prompt or completion data) exists in the production artifact.
3. Volatile Memory Scoping — Credential Isolation
Credentials exist only as transient environment states within the isolated execution context and are purged upon task completion. At no point are credentials written to disk, logged, or transmitted by the relay.
Platform Credential Matrix
- macOS — Hardware-Authorized Secure Enclave via OS-mediated consent prompt
- Linux — libsecret system keyring with credential-file fallback
- Windows — PasswordVault via Credential Manager
- Electron — safeStorage with OS-level encryption
- Browser — Web Crypto AES-GCM with per-origin isolation
Shell Injection Prevention
Credential retrieval uses parameterized binary invocation — never shell-mediated execution. This is a critical distinction:
- Shell-mediated execution subjects arguments to shell interpolation. A credential containing
$(rm -rf /)would be evaluated. - Parameterized invocation passes arguments as discrete array elements directly to the OS process API. No shell is involved. No interpolation occurs.
This eliminates an entire class of injection vectors at the system call level.
4. Process Sandboxing — Native Subscription Gateway Isolation
The Native Subscription Gateway runs as a fully isolated execution context. Communication occurs exclusively through discrete I/O channels — there is no shared memory between the relay and the gateway process.
Isolation Properties
- Parameterized invocation — Arguments passed as typed array elements, never concatenated into a shell string
- Discrete I/O channels — Input, output, and diagnostic channels are independent and isolated with no cross-contamination
- Lifecycle tracking — All gateway processes are tracked for deterministic cleanup
- Signal management — Termination signals are issued on shutdown, with platform-specific handling to prevent zombie processes
- Asynchronous Stream Decoding — Gateway output is decoded as a structured stream, with each event boundary validated before processing
Zombie Prevention
The relay maintains an active process registry. On shutdown — whether clean exit, uncaught exception, or signal interrupt — every registered process receives a termination signal. The registry is cleared atomically. No gateway process survives the parent.
5. Signed Trust Chain — Logic Synchronization Security (RS256)
The relay's configuration and entitlement system uses RSA-2048 asymmetric signatures to establish a tamper-proof trust chain between server and client.
Asymmetric Decoupling
- Server — Signs responses with an RSA-2048 private key, maintained on air-gapped secure infrastructure
- Client — Verifies signatures with an embedded public key. The public key can only verify; it cannot sign.
This means: compromise of any client installation — or even every client installation — cannot forge a server response. An attacker with full access to the client binary can read the public key, but reading a verification key grants zero signing capability. This is the fundamental property of asymmetric cryptography, applied here as an architectural guarantee.
Security State Machine
The relay maintains a security state that degrades gracefully under adverse conditions:
SEC_NOMINAL— Signature verified, configuration fresh. Normal operation.SEC_CACHE— Verification unsuccessful, cached configuration valid. Operates on last-known-good signed state.SEC_RECOVERY— Attempting to re-establish signed state. Retrying verification with backoff.SEC_DEGRADED— Extended verification failure. Logic Synchronization Buffer active.
Logic Synchronization Buffer
In SEC_DEGRADED state, the relay introduces randomized timing decorrelation into stream processing. This serves two purposes:
- Stream stability — Prevents burst-mode failures during protocol recovery from cascading into client-visible errors.
- Timing decorrelation — Ensures that degraded-state behavior cannot be distinguished from nominal-state behavior through timing analysis alone.
For security auditors: this is a deliberate stability mechanism, not a defect. The jitter window is calibrated to be imperceptible to end users while providing sufficient buffer for recovery operations to complete.
6. MITM Protection
The signature verification system enforces a strict fail-closed posture for authorization decisions:
- Unsigned
valid:trueresponses are rejected. An attacker who intercepts the verification request and returns an unsigned positive response gains nothing — the client will not accept it. - Only signed JWS responses are acceptedfor positive validation. The server's RSA-2048 signature is the sole authority.
- Unsigned error responses are accepted for error reporting only. This is a deliberate fail-open for diagnostics: if the server returns an unsigned error (network issue, server outage), the client can surface the error message without requiring a signature. But an unsigned error can never grant authorization.
This asymmetry — fail-open for errors, fail-closed for authorization — ensures that network-level attacks can disrupt service but cannot grant unauthorized access.
7. Prompt Visibility
A common concern: “Where do my prompts go?”
With the Native Subscription Gateway (Auto-patch)
Prompts are routed through the user's own Claude subscription. They appear in the user's Claude activity history. This is their subscription, their data, their existing relationship with Anthropic. The relay is not in the data path — it orchestrates the connection, then steps aside.
With API Providers
Prompts are sent through standard Anthropic or OpenAI API endpoints, governed by the provider's existing Data Processing Agreement (DPA). The relay adds headers and manages streaming — it does not store, log, or inspect content.
In Both Cases
inference-relay servers see zero prompt content and zero completion content.This is not a configuration option. It is the Deterministic Privacy guarantee described in Section 1, enforced at the type level before compilation.
8. Telemetry
The relay transmits operational telemetry for usage tracking and cost attribution. The telemetry payload is strictly scoped:
Fields Transmitted
- Provider — Which AI service was used
- Model — Which model was invoked
- Input token count — Usage metering
- Output token count — Usage metering
- Estimated cost — Cost attribution
- Duration — Performance monitoring
- Fallback — Whether a backup provider was used
Fields Never Transmitted
- Prompt content — Structurally excluded (literal
falsetype) - Completion content — Structurally excluded (literal
falsetype) - System prompts — Not captured at any layer
- Tool definitions — Not captured at any layer
- Function call arguments — Not captured at any layer
Failure Handling
Telemetry is fire-and-forget. If the telemetry endpoint is unreachable, the event is dropped silently. Telemetry failure neverblocks, delays, or degrades inference. The user's AI interaction completes regardless of telemetry state.
Continue reading: Enterprise Security Whitepaper for the full architectural treatment intended for security auditors.