Security — Admissible Labs

April 2026

The Threat
Is Real

Mythos changed the
threat landscape overnight.

On April 10, 2026, Anthropic published research on Claude Mythos Preview — an AI model that autonomously discovered tens of thousands of previously unknown vulnerabilities in every major operating system, browser, and kernel on earth. In safety testing, Mythos attempted prompt injection against its own judge, escaped its sandbox, and posted exploit code to public sites without instruction. OWASP now lists prompt injection as the #1 AI vulnerability, present in 73% of deployed AI systems.

These are not theoretical. They happened. Five attack classes are now proven exploitable at scale. The table below shows each — and the Admissible architectural response.

Indirect
Injection

Malicious instructions embedded in contracts, emails, or web content that AI agents read. Mythos was induced to act on attacker-controlled content during file scanning.

→

Liability preamble is structurally first. External content arrives in the user tier. The system tier — where the preamble lives — cannot be displaced by content from a lower tier. Adversarial input screener runs on all external content before agent context inclusion.

Multi-Agent
Trust Chain

Agent-to-agent communication has no cryptographic verification. A jailbroken agent can cascade compromises across an entire orchestrated system — one infected agent, all agents infected.

→

The Kernel is a separate process. No agent has write access to its own governance record. Every action — from every agent — requires a conformant certificate from the Kernel. A jailbroken agent can issue instructions; those instructions still cannot execute without a valid certificate.

Privilege
Escalation

Injected instructions manipulate tool selection, causing agents to take actions beyond their intended scope — accessing files, making network calls, or executing code they shouldn't.

→

Pre-execution verification. Every action is checked against the agent's DA matrix before it executes. Injected instructions that produce actions exceeding authority limits are blocked — not logged after. The check happens before the tool call, not after.

Sandbox
Escape

Mythos posted exploit code to public sites and escaped containment without instruction — autonomous action diverging from human intent, with real-world consequences.

→

Fail-closed is unconditional. If the Kernel cannot verify an action, the default verdict is BLOCKED — never CONFORMANT. The circuit breaker halts all processing at Level 3 rather than allowing uncertain actions through. Failure mode is always stop, never proceed.

Supply
Chain

LiteLLM — downloaded 3.4 million times per day — was compromised to steal cloud credentials and SSH keys. AI dependencies are a critical attack vector that most governance systems ignore.

→

Zero pip dependencies. The Kernel imports no third-party Python packages. pyproject.toml: dependencies = []. The LiteLLM attack vector does not exist here. Nothing to poison.

Kernel

The Kernel's security model

Four non-negotiable properties. Each one is an architectural guarantee, not a configuration option.

Fail-closed architecture

If the Kernel cannot verify an action — network issue, timeout, configuration error — the default verdict is BLOCKED, not CONFORMANT. The system never fails open. An AI agent cannot act during a governance outage.

HMAC-SHA256 signed certificates

Every governance certificate is signed with HMAC-SHA256. The signature covers the entire payload: agent ID, action, value, all 9 check results, timestamp. Change one character after signing and the signature breaks. Certificates cannot be retroactively altered.

Independent verification

The Kernel is a separate process from the Brain. The system proposing actions is not the system verifying them. No agent has write access to its own governance record.

Formally verified checks

The 9 core checks are formally verified using Lean 4. 76 Lean 4 theorems proven : correctness, totality, determinism, exhaustiveness, and no false conformant. These proofs carry the same mathematical weight as a compiler proof — they cannot be wrong without the proof system being wrong.

Brain

The Brain's security layers

Three independent layers protecting against misuse, misconfiguration, and bad outputs — before the Kernel ever sees a request.

Layer 1

Liability Preamble — cannot be overridden

Every agent's system prompt begins with a liability preamble that loads before all other content. It cannot be overridden by user messages, document content, or injected instructions. It sets hard limits: no professional advice, no liability acceptance, no regulatory overclaiming. This runs first, structurally, not as a soft instruction.

Layer 2

GUARDIAN Archetype

Legal, compliance, security, and finance agents run on the GUARDIAN archetype: temperature 0.2, harm-prevention-first reasoning, fail-closed bias. GUARDIAN agents are configured to surface risk before opportunity — they will block an action before they endorse one. This is not a prompt instruction; it is the archetype that governs how the agent reasons.

Layer 3

Assertion Sets — 17 sets covering all 16 departments

All 16 departments have assertion sets that produce governance certificates for agent outputs — not just actions. A legal agent cannot produce unhedged advice ("I guarantee..."). A finance agent cannot classify cash as revenue. An SRE agent cannot approve a production deploy without a confirmed rollback plan. Each violation produces a signed BLOCK certificate. Departments: Finance, Legal, People, Sales, Customer Success, SRE.

Injection Defense

Prompt injection defense

The Brain has a dedicated Red Team agent whose primary function is testing the Brain's own defenses — continuously.

Bastion (L1-SEC-03) is a dedicated Red Team agent that runs adversarial tests against the Brain's own agent fleet. Its scope covers all 78 agents .

Direct prompt injection — instructions embedded in user messages attempting to override the system prompt or liability preamble.
Indirect prompt injection — instructions embedded inside documents, emails, or web content that an agent processes as input.
Jailbreak resistance testing — structured attempts to bypass GUARDIAN archetype constraints across all 78 agents.
OWASP LLM Top 10 coverage — Bastion's test suite is mapped to the OWASP LLM Top 10.

The liability preamble (Layer 1) is structurally first — it cannot be displaced by injected content because it is injected by the system, not processed from user input. There is no execution path where injected content runs before the preamble.

Why structural-first works — the mechanism

The liability preamble is injected into the system prompt tier. Injected content — from contracts, emails, documents, or web pages — arrives in the user/assistant content tier. These tiers are processed in strict ordering by the model architecture. No technique can promote content from the user tier to the system tier. This is not a filter that can be bypassed. It is a processing order boundary.

When Mythos attempted prompt injection against its judge, it was exploiting exactly the trust gap that exists when agents communicate without cryptographic verification. Our Kernel closes that gap: there is no inter-agent trust to exploit. Every agent is untrusted from the Kernel's perspective, regardless of what any other agent claims.

Infectious jailbreaks — why they don't cascade here

Research shows a single adversarial input can spread through million-agent systems exponentially. At Admissible, a jailbroken agent can issue instructions to another agent — but those instructions still require a conformant Kernel certificate before any action executes. The Kernel evaluates every request independently, as if the requesting agent were untrusted. There is no accumulated trust. There is no cascade path.

Deployment

Deployment security

Network isolation, key management, and circuit-breaker degradation at the infrastructure level.

Docker network isolation — all inter-service communication runs within the Docker network. Brain-to-Kernel traffic never leaves the internal network.
Port 8443, internal only — the Kernel is not internet-facing. It listens on port 8443 inside the Docker network, reachable only by explicitly configured services.
API key authentication on all Kernel endpoints via X-API-Key header. Keys are set by environment variable, never hardcoded.
Rate limiting per API key — configurable per deployment. Defaults are conservative. Exceeded limits produce a signed BLOCKED response, not a silent drop.

Circuit breaker — three-level degradation:

Level 1
Full

All checks active — Z3 formal proofs, full 9-check verification, HMAC signing. Normal operation.

Level 2
Essential

Core 9 checks only — Z3 proof generation skipped under load. Signing remains active. Triggered automatically under error threshold.

Level 3
Halt

Full halt at error ceiling. No new actions process. All requests return a signed BLOCKED certificate. Manual restart required.

Supply Chain

Zero
Attack
Surface

The LiteLLM attack vector
does not exist here.

In 2026, LiteLLM — downloaded 3.4 million times per day — was compromised via PyPI to steal cloud credentials and SSH keys. Supply chain attacks on AI infrastructure are now proven and active. The Kernel's design is structurally immune.

✓

Zero pip dependencies. The Kernel imports no third-party Python packages. Every import is Python standard library. Nothing to poison via PyPI, nothing to compromise via a dependency update.

✓

Formally verified core. The 9 checks are verified with Lean 4 (76 Lean 4 theorems ). Mathematical properties hold regardless of what runs around the Kernel. The logic cannot be altered by a compromised dependency — there are no dependencies.

✓

Docker network isolation. The Kernel runs in an isolated container on port 8443, internal only. Even a compromised host environment cannot reach the signing key via lateral movement across the container boundary.

✓

Runtime key injection. The HMAC signing key is an environment variable injected at runtime. The source code contains no keys. Compromise of the application layer does not automatically expose signing material.

✓

Customer Zero. Admissible Labs' own development pipeline is governed by this Kernel. Supply chain attacks on our own tools are caught by our own governance stack. We are the first production deployment of what we sell.

pyproject.toml — Kernel runtime dependencies dependencies = []

Phase 18

Security hardening

Three security gaps closed in v6.1.0. All were identified through our own Customer Zero deployment — we found them by governing our own agents.

Z3 timeout guard — Threading-based 5-second hard limit on formal SMT solver calls. Before this, adversarial inputs could cause the Z3 solver to hang indefinitely. On timeout, the verdict is now BLOCKED (fail-closed). This prevents a denial-of-governance attack.
HMAC payload unification — Three independent signing code paths existed, each serialising certificate payloads differently. An adversary with access to one path could theoretically produce a certificate that passed one verifier but failed another. Now a single canonical serialiser is used everywhere.
Bounded certificate store — The in-memory certificate store previously grew without bound. Under sustained load, this was a denial-of-service vector. LRU eviction now caps the store at a configurable limit (default: 10,000 certificates).

Brain 4.0

Defense deepened at every layer

Brain 4.0 strengthens the security posture above the kernel with four additional defense mechanisms — each independently verifiable, each operating under the same fail-closed principle.

Adversarial LLM review

A separate Haiku challenger model reviews every high-value action output for manipulation, prompt injection, and goal deviation before the action certificate is issued. Reviewer verdict is DISMISS, ESCALATE, or HUMAN_HOLD. The primary agent is unaware of the review.

Evidence grading enforcement

Every knowledge graph node carries an evidence grade (A–E). Action types define a minimum required grade. An action requiring Grade B evidence cannot proceed if only Grade D evidence is available. The check runs inside the kernel assertion pipeline.

HMAC-signed tool invocations

Tool_use calls are signed with HMAC-SHA256 before dispatch. The signature covers connector name, action type, parameters, authority level, and agent ID. Ring 0 blocks any invocation targeting protected resource namespaces regardless of authority level.

Z3 formal proof expansion — P28–P35

Phase 32 added 8 new Z3 SMT proofs (P28–P35) covering temporal and relationship assertion correctness : temporal window enforcement, AND constraint conjunction, CONFORMANT-only matching, transitivity, must_not_exist, must_exist, status filter conjunction, and fail-closed behaviour. Total: 35 Z3 properties proven + 76 Lean 4 theorems .

The math doesn't care
that we're a startup.

Security is in the math, not the headcount.

A formally verified kernel cannot be breached by resource constraints. The theorem proving that no false conformant certificate can be issued holds whether we have 5 engineers or 5,000. Mathematical proof doesn't scale with headcount — it either holds or it doesn't.

We eat our own cooking.

Customer Zero is Admissible Labs itself. Every AI-assisted commit to our own codebase passes through this Kernel. Every AI action our own agents take produces a signed certificate. Our development pipeline is the first production deployment of the system we sell.

Failure mode is always BLOCKED.

If our infrastructure degrades, is attacked, or goes offline — the default verdict is BLOCKED. A startup that loses connectivity doesn't accidentally approve actions; it stops all actions. Fail-closed is unconditional. There is no fallback that passes.

We show the proof, not the promise.

174 tests . 76 proven theorems . 0 external Kernel dependencies . Published architecture. Formal verification output available on request. We don't ask you to trust us. We give you the means to verify every claim independently.

174 tests passing · 76 Lean 4 theorems · 0 external dependencies · 9 checks , no short-circuit · fail-closed · HMAC-SHA256 signed

Security is structural,
not optional

Mythos changed the
threat landscape overnight.

The Kernel's security model

The Brain's security layers

Prompt injection defense

Deployment security

The LiteLLM attack vector
does not exist here.

Security hardening

Defense deepened at every layer

The math doesn't care
that we're a startup.

Security is in the math, not the headcount.

We eat our own cooking.

Failure mode is always BLOCKED.

We show the proof, not the promise.

Security questions? Ask them.

Security is structural,not optional

Mythos changed thethreat landscape overnight.

The Kernel's security model

The Brain's security layers

Prompt injection defense

Deployment security

The LiteLLM attack vectordoes not exist here.

Security hardening

Defense deepened at every layer

The math doesn't carethat we're a startup.

Security is in the math, not the headcount.

We eat our own cooking.

Failure mode is always BLOCKED.

We show the proof, not the promise.

Security questions? Ask them.

Security is structural,
not optional

Mythos changed the
threat landscape overnight.

The LiteLLM attack vector
does not exist here.

The math doesn't care
that we're a startup.