Enterprise Security Whitepaper
This document describes the Data Sovereignty Architecture of inference-relay for enterprise security review. It is intended for CTOs, CISOs, and security auditors evaluating inference-relay as a dependency. All claims are verifiable under NDA via our 48-hour read-only repository access program.
I. Executive Summary
inference-relay implements a Data Sovereignty Architecture that fundamentally changes the security posture of AI-assisted applications. The core guarantee: inference-relay eliminates the “Third-Party Data Processor” risk by utilizing a Zero-Interception Routing model.
The library routes inference calls from a developer's application to the end user's existing AI subscription (Claude Pro, Claude Max, Claude Team, Claude Enterprise, or OpenAI equivalents). No prompt content, no completion content, and no conversation context touches inference-relay servers at any point in the request lifecycle. The developer pays inference-relay pennies for orchestration metadata. The user's own subscription handles model execution through the provider's own infrastructure.
The practical consequence for your organization: adopting inference-relay reduces your attack surface relative to any architecture that routes prompts through a third-party API gateway. There is no new data processor to vet, no new DPA to negotiate, no new vector for prompt exfiltration. The inference path runs entirely on the user's machine, through the user's credentials, to the user's provider.
inference-relay is not a proxy. It is a local binary dependency that orchestrates execution on infrastructure your organization already trusts.
II. Hardware-Bound Credential Isolation
OS-Mediated Consent
inference-relay maintains zero persistent credential storage. It does not write tokens to disk, embed them in configuration files, or cache them in application memory beyond the lifetime of a single task. Instead, it requests a transient session token from the operating system's secure enclave at the moment of use.
| Platform | Secure Storage | Mechanism |
|---|---|---|
| macOS | Hardware-Authorized Secure Enclave | OS-mediated secure storage API |
| Linux | libsecret / credential file | D-Bus Secret Service |
| Windows | PasswordVault | Windows Credential Manager |
| Electron | safeStorage | OS-level encryption via Chromium |
| Browser | AES-GCM via Web Crypto | Non-extractable CryptoKey |
Volatile Memory Injection
Credentials are injected into the Native Subscription Gateway as transient environment states within an isolated execution context. They exist only in volatile memory for the duration of the task and are purged upon process termination. There is no window during which credentials are written to disk, logged to stdout, or accessible to sibling processes.
The user explicitly consents to credential access through the operating system's native permission dialog. inference-relay cannot silently acquire credentials—the OS mediates every access request, and the user sees exactly which application is requesting access to the Hardware-Authorized Secure Enclave.
III. Process Sandboxing Architecture
Shell-Free Invocation
inference-relay invokes the Native Subscription Gateway using parameterized argument arrays passed directly to the binary. The system shell is never involved—no /bin/sh on Unix, no cmd.exe on Windows. Common injection attacks that rely on shell metacharacter interpretation are physically impossible at the architectural level.
Shell injection requires a shell. inference-relay never invokes one. Arguments are parameterized arrays, never string-concatenated commands.
Standard I/O Pipe Isolation
The Native Gateway binary runs as an isolated process with a scoped environment. stdin, stdout, and stderr are discrete pipes—no shared memory, no shared state, no file descriptor leakage between the parent application and the execution context. The gateway cannot read the parent's memory space, and the parent consumes only the structured output written to its stdout pipe.
Process Lifecycle Management
Orphan prevention is enforced through deterministic process tracking. On parent exit, every tracked execution context receives a termination signal. There is no scenario in which an orphaned gateway process continues executing inference after the parent application terminates.
- Arguments: Parameterized arrays passed directly to the binary. No string concatenation, no template interpolation.
- Environment: Scoped per execution context. Credentials injected as environment variables, invisible to other processes on the system.
- Teardown: Deterministic. Process tracking guarantees no orphans survive parent termination.
Computational Resource De-duplication
The Native Subscription Gateway identifies and suppresses redundant system-state markers to optimize the hand-off between the orchestration and execution domains. This computational resource de-duplication reduces overhead during the inference hand-off by 89–95%, verified through end-to-end testing with representative workloads.
IV. Asymmetric Logic Verification (RS256)
Signed Trust Chain
The inference-relay server signs all manifests and license validation payloads with an RSA-2048 private key. The client library verifies these signatures using an embedded public key. This creates an Asymmetric Decoupling: the library can verify that a payload originated from the legitimate server, but it can never sign payloads itself. The private key is air-gapped on secure infrastructure and never distributed.
This architecture provides Logic Tampering protection. A forged manifest—whether injected by a man-in-the-middle, a compromised CDN, or a malicious package registry—cannot silently alter library behavior. The RSA-2048 signature verification will reject it before any instructions are parsed.
The manifest functions as a dynamic interoperability layer between two independently versioned systems, absorbing upstream interface evolution without requiring application-level code modifications or binary redistribution. When the native inference gateway releases a new version with a changed output format, the signed manifest updates automatically—your application continues working without a patch, a deploy, or a line of code changed.
Protocol Integrity Enforcement
The library maintains four security states that govern its behavior when verification fails:
| State | Trigger | Behavior |
|---|---|---|
| SEC_NOMINAL | Signature verified | Full operation, cache refreshed |
| SEC_CACHE | Verification unsuccessful | Operates from last-known-good cached configuration |
| SEC_RECOVERY | 1-2 consecutive verification failures | Staggered retry with 50-100ms jitter |
| SEC_DEGRADED | 3 consecutive verification failures | High-entropy recovery state, restricted operation |
The SEC_DEGRADED state is the terminal protection: after three consecutive verification failures, the library assumes the trust chain has been compromised. The system utilizes Asynchronous Timing Decorrelation to isolate the system's internal security state from observable external latency patterns, preventing timing-based attacks against the recovery mechanism. License validation employs adaptive timeout management—a wider initial window accommodates distributed infrastructure variability, while subsequent validations use a tighter heartbeat to maintain authorization freshness. Responses are cached with a hard maximum staleness threshold, ensuring that even in prolonged network partition scenarios, the library will not operate on arbitrarily stale authorization.
V. The Dumb Pipe Guarantee
Type-Level Enforcement of Privacy
inference-relay's telemetry schema contains a Static Analysis Guardrail that makes accidental content logging a compile-time error, not a runtime oversight. The AuditEvent interface declares:
promptContent and completionContent are typed as the literal value false, not as boolean or string. A developer cannot “accidentally” assign prompt text to these fields because the TypeScript compiler will reject it. The compiler acts as an automated security auditor—the type system makes content leakage a static analysis failure, not a code review finding.
Telemetry transmits only: provider name, model identifier, token counts, estimated cost, request duration, and fallback status. Zero content fields exist in the schema. There is no mechanism—accidental or intentional—to attach prompt or completion data to a telemetry event without modifying the type definition itself, which is a tracked, reviewable change.
inference-relay replaces administrative trust with structural impossibility. This guarantee is categorically distinct from runtime content filters or policy-based access controls. A literal type constraint creates a physical impossibility of data transit that persists regardless of runtime configuration, administrative override, or code modification that is not re-verified by the compiler.
Binary String Entropy Scan
The CI pipeline includes an entropy scanner that examines compiled JavaScript output for high-entropy strings that could indicate inadvertently embedded content, API keys, or credential material. This provides a secondary verification layer beyond the type system: even if a developer circumvented the TypeScript guardrail through type assertions, the compiled output would trigger the entropy scan before reaching production.
Application Logic Opacity
The Dumb Pipe Guarantee extends beyond inference-relay's own telemetry to the developer's proprietary logic. The relay operates as a Zero-Knowledge Orchestrator—it routes inference requests without visibility into the developer's system prompts, tool schemas, function call arguments, or routing configuration. These artifacts exist exclusively within the developer's server-side application and are never transmitted to, cached by, or accessible through the relay.
End users may observe operational metadata (provider name, model identifier, token counts) and the embedded RSA public verification key. Neither exposes proprietary logic. The public key enables signature verification only—it cannot sign responses, forge license validations, or elevate tier entitlements. License keys are scoped to the holder's own telemetry and audit trail; cross-tenant data access is structurally impossible.
This separation is architectural, not policy-based. The relay binary contains no application-specific logic to extract. It is a generic orchestration primitive—the developer's intellectual property (system prompts, tool definitions, routing rules, business logic) resides entirely in their own infrastructure.
VI. Compliance Positioning
Direct Subscription Utilization (DSU)
inference-relay implements what we term Direct Subscription Utilization: the library enables employees to use AI capabilities through the organization's existing Anthropic or OpenAI data processing agreements. Rather than introducing a new data processor into your supply chain, inference-relay brings Shadow AI (the use of personal AI subscriptions for work tasks, outside corporate visibility and governance) into compliance by routing it through subscriptions your legal team has already vetted.
Client-Side Software Classification
Because inference-relay operates as a Local Binary Dependency—executing entirely on the end user's machine with no server-side prompt processing—it qualifies as Client-Side Software rather than a Cloud Service under most procurement frameworks. This distinction simplifies vendor onboarding significantly:
- No new DPA required. inference-relay does not process, store, or have access to prompt or completion content. The data processing relationship remains exclusively between the user and their AI provider.
- No new data processor introduced. Your existing Anthropic DPA governs the inference. inference-relay is a local tool, not a data intermediary.
- Simplified SOC 2 scoping. Client-side dependencies that do not process customer data fall outside the trust boundary for most audit frameworks.
- Platform Integrity. inference-relay utilizes official binary protocols, ensuring that the AI provider's internal safety filters and prompt caching remain active and effective.
Risk Transfer
The security model creates a clean Risk Transfer boundary. inference-relay is responsible for: orchestration correctness, credential isolation, process sandboxing, and manifest integrity. inference-relay is not responsible for—and has no access to—prompt content, completion content, model behavior, or data retention. Those responsibilities remain with the AI provider under your existing agreements.
Tamper-Evident Audit Chain
The audit trail forms a tamper-evident hash chain ensuring operational non-repudiation. Each entry includes a cryptographic hash computed over the previous entry's hash and the current entry's metadata. Modification of any historical entry invalidates all subsequent hashes, allowing auditors to verify the integrity of the resource-allocation log without exposing content-level data.
Granular Resource Governance
The system maintains structurally separate cost attribution for each authorization domain, enabling granular resource governance and verifiable accounting of which domain absorbed which portion of the computational workload. The orchestration domain cost and the execution domain cost are recorded independently in every audit event.
Enterprise Audit Access
For organizations requiring source code review prior to adoption, inference-relay offers a structured audit pathway:
- NDA workflow: Standard mutual NDA executed prior to access grant.
- 48-hour read-only repository access: Your security team receives time-boxed access to the full source repository for independent verification of every claim in this document.
- Reproducible builds: Published packages can be verified against source through deterministic build output.
VII. Summary of Guarantees
| Property | Guarantee | Enforcement |
|---|---|---|
| Prompt privacy | Zero content transmission | Type-level literal false fields |
| Credential storage | Zero persistent storage | OS secure enclave, volatile injection |
| Shell injection | Physically impossible | Parameterized binary invocation bypasses shell entirely |
| Manifest integrity | RSA-2048 signed trust chain | Asymmetric verification, air-gapped key |
| Process isolation | Discrete I/O channels, scoped env | No shared memory or state |
| Zombie prevention | Deterministic teardown | Process registry with platform-appropriate termination signals |
| License cache staleness | 7-day hard maximum | SEC_DEGRADED after 3 failures |
| Content leakage in CI | Binary entropy scanning | Automated high-entropy string detection |
| Application logic opacity | Zero-knowledge orchestration | No application-specific logic in relay binary; IP stays server-side |
Every guarantee in this document is a verifiable architectural property, not a policy promise. The type system enforces privacy. The process model enforces isolation. The cryptographic chain enforces integrity. These properties hold regardless of developer intent, configuration errors, or runtime conditions.