Enterprise Security Whitepaper

This page documents inference-relay v1.0 (npm library). For the standalone daemon (v1.1+), see v1.1 Daemon Whitepaper →

Classification

This document describes the Data Sovereignty Architecture of inference-relay for enterprise security review. It is intended for CTOs, CISOs, and security auditors evaluating inference-relay as a dependency. All claims are verifiable under NDA via our 48-hour read-only repository access program.

Prepared: April 6, 2026
Author: inference-relay security team
Scope: Credential handling, process isolation, cryptographic verification, telemetry boundaries, compliance posture
Version: 1.0
PATENT PENDING

I. Executive Summary

inference-relay implements a Data Sovereignty Architecture that fundamentally changes the security posture of AI-assisted applications. The core guarantee: inference-relay eliminates the “Third-Party Data Processor” risk by utilizing a Zero-Interception Routing model.

The library routes inference calls from a developer's application to the end user's existing AI subscription (Claude Pro, Claude Max, Claude Team, Claude Enterprise, or OpenAI equivalents). No prompt content, no completion content, and no conversation context touches inference-relay servers at any point in the request lifecycle. The developer pays inference-relay pennies for orchestration metadata. The user's own subscription handles model execution through the provider's own infrastructure.

The practical consequence for your organization: adopting inference-relay reduces your attack surface relative to any architecture that routes prompts through a third-party API gateway. There is no new data processor to vet, no new DPA to negotiate, no new vector for prompt exfiltration. The inference path runs entirely on the user's machine, through the user's credentials, to the user's provider.

inference-relay is not a proxy. It is a local binary dependency that orchestrates execution on infrastructure your organization already trusts.

II. Hardware-Bound Credential Isolation

OS-Mediated Consent

inference-relay maintains zero persistent credential storage. It does not write tokens to disk, embed them in configuration files, or cache them in application memory beyond the lifetime of a single task. Instead, it requests a transient session token from the operating system's secure enclave at the moment of use.

Platform	Secure Storage	Mechanism
macOS	Hardware-Authorized Secure Enclave	OS-mediated secure storage API
Linux	libsecret / credential file	D-Bus Secret Service
Windows	PasswordVault	Windows Credential Manager
Electron	safeStorage	OS-level encryption via Chromium
Browser	AES-GCM via Web Crypto	Non-extractable CryptoKey

Volatile Memory Injection

Credentials are injected into the Native Subscription Gateway as transient environment states within an isolated execution context. They exist only in volatile memory for the duration of the task and are purged upon process termination. There is no window during which credentials are written to disk, logged to stdout, or accessible to sibling processes.

The user explicitly consents to credential access through the operating system's native permission dialog. inference-relay cannot silently acquire credentials—the OS mediates every access request, and the user sees exactly which application is requesting access to the Hardware-Authorized Secure Enclave.

III. Process Sandboxing Architecture

Shell-Free Invocation

inference-relay invokes the Native Subscription Gateway using parameterized argument arrays passed directly to the binary. The system shell is never involved—no /bin/sh on Unix, no cmd.exe on Windows. Common injection attacks that rely on shell metacharacter interpretation are physically impossible at the architectural level.

Shell injection requires a shell. inference-relay never invokes one. Arguments are parameterized arrays, never string-concatenated commands.

Standard I/O Pipe Isolation

The Native Gateway binary runs as an isolated process with a scoped environment. stdin, stdout, and stderr are discrete pipes—no shared memory, no shared state, no file descriptor leakage between the parent application and the execution context. The gateway cannot read the parent's memory space, and the parent consumes only the structured output written to its stdout pipe.

Process Lifecycle Management

Orphan prevention is enforced through deterministic process tracking. On parent exit, every tracked execution context receives a termination signal. There is no scenario in which an orphaned gateway process continues executing inference after the parent application terminates.

Arguments: Parameterized arrays passed directly to the binary. No string concatenation, no template interpolation.
Environment: Scoped per execution context. Credentials injected as environment variables, invisible to other processes on the system.
Teardown: Deterministic. Process tracking guarantees no orphans survive parent termination.

Computational Resource De-duplication

The Native Subscription Gateway identifies and suppresses redundant system-state markers to optimize the hand-off between the orchestration and execution domains. This computational resource de-duplication reduces overhead during the inference hand-off by 89–95%, verified through end-to-end testing with representative workloads.

IV. Asymmetric Logic Verification (RS256)

Signed Trust Chain

The inference-relay server signs all manifests and license validation payloads with an RSA-2048 private key. The client library verifies these signatures using an embedded public key. This creates an Asymmetric Decoupling: the library can verify that a payload originated from the legitimate server, but it can never sign payloads itself. The private key is air-gapped on secure infrastructure and never distributed.

This architecture provides Logic Tampering protection. A forged manifest—whether injected by a man-in-the-middle, a compromised CDN, or a malicious package registry—cannot silently alter library behavior. The RSA-2048 signature verification will reject it before any instructions are parsed.

The manifest functions as a dynamic interoperability layer between two independently versioned systems, absorbing upstream interface evolution without requiring application-level code modifications or binary redistribution. When the native inference gateway releases a new version with a changed output format, the signed manifest updates automatically—your application continues working without a patch, a deploy, or a line of code changed.

Protocol Integrity Enforcement

The library maintains four security states that govern its behavior when verification fails:

State	Trigger	Behavior
SEC_NOMINAL	Signature verified	Full operation, cache refreshed
SEC_CACHE	Verification unsuccessful	Operates from last-known-good cached configuration
SEC_RECOVERY	1-2 consecutive verification failures	Staggered retry with 50-100ms jitter
SEC_DEGRADED	3 consecutive verification failures	High-entropy recovery state, restricted operation

The SEC_DEGRADED state is the terminal protection: after three consecutive verification failures, the library assumes the trust chain has been compromised. The system utilizes Asynchronous Timing Decorrelation to isolate the system's internal security state from observable external latency patterns, preventing timing-based attacks against the recovery mechanism. License validation employs adaptive timeout management—a wider initial window accommodates distributed infrastructure variability, while subsequent validations use a tighter heartbeat to maintain authorization freshness. Responses are cached with a hard maximum staleness threshold, ensuring that even in prolonged network partition scenarios, the library will not operate on arbitrarily stale authorization.

V. The Dumb Pipe Guarantee

Type-Level Enforcement of Privacy

inference-relay's telemetry schema contains a Static Analysis Guardrail that makes accidental content logging a compile-time error, not a runtime oversight. The AuditEvent interface declares:

// TypeScript literal types as security enforcement
interface AuditEvent {
promptContent: false;    // literal type — compile error if assigned
completionContent: false; // literal type — compile error if assigned
// ... operational metadata fields (no content fields exist)
}

promptContent and completionContent are typed as the literal value false, not as boolean or string. A developer cannot “accidentally” assign prompt text to these fields because the TypeScript compiler will reject it. The compiler acts as an automated security auditor—the type system makes content leakage a static analysis failure, not a code review finding.

Telemetry transmits only: provider name, model identifier, token counts, estimated cost, request duration, and fallback status. Zero content fields exist in the schema. There is no mechanism—accidental or intentional—to attach prompt or completion data to a telemetry event without modifying the type definition itself, which is a tracked, reviewable change.

inference-relay replaces administrative trust with structural impossibility. This guarantee is categorically distinct from runtime content filters or policy-based access controls. A literal type constraint creates a physical impossibility of data transit that persists regardless of runtime configuration, administrative override, or code modification that is not re-verified by the compiler.

Binary String Entropy Scan

The CI pipeline includes an entropy scanner that examines compiled JavaScript output for high-entropy strings that could indicate inadvertently embedded content, API keys, or credential material. This provides a secondary verification layer beyond the type system: even if a developer circumvented the TypeScript guardrail through type assertions, the compiled output would trigger the entropy scan before reaching production.

Application Logic Opacity

The Dumb Pipe Guarantee extends beyond inference-relay's own telemetry to the developer's proprietary logic. The relay operates as a Zero-Knowledge Orchestrator—it routes inference requests without visibility into the developer's system prompts, tool schemas, function call arguments, or routing configuration. These artifacts exist exclusively within the developer's server-side application and are never transmitted to, cached by, or accessible through the relay.

End users may observe operational metadata (provider name, model identifier, token counts) and the embedded RSA public verification key. Neither exposes proprietary logic. The public key enables signature verification only—it cannot sign responses, forge license validations, or elevate tier entitlements. License keys are scoped to the holder's own telemetry and audit trail; cross-tenant data access is structurally impossible.

This separation is architectural, not policy-based. The relay binary contains no application-specific logic to extract. It is a generic orchestration primitive—the developer's intellectual property (system prompts, tool definitions, routing rules, business logic) resides entirely in their own infrastructure.

VI. Compliance Positioning

Direct Subscription Utilization (DSU)

inference-relay implements what we term Direct Subscription Utilization: the library enables employees to use AI capabilities through the organization's existing Anthropic or OpenAI data processing agreements. Rather than introducing a new data processor into your supply chain, inference-relay brings Shadow AI (the use of personal AI subscriptions for work tasks, outside corporate visibility and governance) into compliance by routing it through subscriptions your legal team has already vetted.

Client-Side Software Classification

Because inference-relay operates as a Local Binary Dependency—executing entirely on the end user's machine with no server-side prompt processing—it qualifies as Client-Side Software rather than a Cloud Service under most procurement frameworks. This distinction simplifies vendor onboarding significantly:

No new DPA required. inference-relay does not process, store, or have access to prompt or completion content. The data processing relationship remains exclusively between the user and their AI provider.
No new data processor introduced. Your existing Anthropic DPA governs the inference. inference-relay is a local tool, not a data intermediary.
Simplified SOC 2 scoping. Client-side dependencies that do not process customer data fall outside the trust boundary for most audit frameworks.
Platform Integrity. inference-relay utilizes official binary protocols, ensuring that the AI provider's internal safety filters and prompt caching remain active and effective.

Risk Transfer

The security model creates a clean Risk Transfer boundary. inference-relay is responsible for: orchestration correctness, credential isolation, process sandboxing, and manifest integrity. inference-relay is not responsible for—and has no access to—prompt content, completion content, model behavior, or data retention. Those responsibilities remain with the AI provider under your existing agreements.

Tamper-Evident Audit Chain

The audit trail forms a tamper-evident hash chain ensuring operational non-repudiation. Each entry includes a cryptographic hash computed over the previous entry's hash and the current entry's metadata. Modification of any historical entry invalidates all subsequent hashes, allowing auditors to verify the integrity of the resource-allocation log without exposing content-level data.

Granular Resource Governance

The system maintains structurally separate cost attribution for each authorization domain, enabling granular resource governance and verifiable accounting of which domain absorbed which portion of the computational workload. The orchestration domain cost and the execution domain cost are recorded independently in every audit event.

Enterprise Audit Access

For organizations requiring source code review prior to adoption, inference-relay offers a structured audit pathway:

NDA workflow: Standard mutual NDA executed prior to access grant.
48-hour read-only repository access: Your security team receives time-boxed access to the full source repository for independent verification of every claim in this document.
Reproducible builds: Published packages can be verified against source through deterministic build output.

VII. Summary of Guarantees

Property	Guarantee	Enforcement
Prompt privacy	Zero content transmission	Type-level literal false fields
Credential storage	Zero persistent storage	OS secure enclave, volatile injection
Shell injection	Physically impossible	Parameterized binary invocation bypasses shell entirely
Manifest integrity	RSA-2048 signed trust chain	Asymmetric verification, air-gapped key
Process isolation	Discrete I/O channels, scoped env	No shared memory or state
Zombie prevention	Deterministic teardown	Process registry with platform-appropriate termination signals
License cache staleness	7-day hard maximum	SEC_DEGRADED after 3 failures
Content leakage in CI	Binary entropy scanning	Automated high-entropy string detection
Application logic opacity	Zero-knowledge orchestration	No application-specific logic in relay binary; IP stays server-side

Every guarantee in this document is a verifiable architectural property, not a policy promise. The type system enforces privacy. The process model enforces isolation. The cryptographic chain enforces integrity. These properties hold regardless of developer intent, configuration errors, or runtime conditions.

For enterprise audit requests, contact security@inference-relay.com. 48-hour read-only repository access is available under mutual NDA. All architectural claims in this document are verifiable against source.