Now selecting design partners

The runtime kill switch for production AI agents.

Talk to founders See what it catches

DECISIONS · LAST HOUR217 approved · 3 paused · 1 blocked

21:06:52$24,800

AP Agent · Process invoice · Vendor 8821 · BEC Pattern (TEAM-1)

21:06:52AP AgentProcess invoice$24,800Vendor 8821 · BEC Pattern (TEAM-1)

21:04:49$480

CS Agent · Issue refund · Customer 4471 · Refund Outside Window (CS-2)

21:04:49CS AgentIssue refund$480Customer 4471 · Refund Outside Window (CS-2)

21:03:21$1,840

Claims Agent · Authorize claim · Claim CL-2271

21:03:21Claims AgentAuthorize claim$1,840Claim CL-2271

21:01:54$4,200

AP Agent · Process invoice · Vendor 8820

21:01:54AP AgentProcess invoice$4,200Vendor 8820

21:00:07$24

CS Agent · Issue refund · Customer 4469

21:00:07CS AgentIssue refund$24Customer 4469

20:58:03$4,200

AP Agent · Process invoice · Vendor 8819 · Duplicate Invoice (TEAM-2)

20:58:03AP AgentProcess invoice$4,200Vendor 8819 · Duplicate Invoice (TEAM-2)

20:55:42$640

Claims Agent · Authorize claim · Claim CL-2270

20:55:42Claims AgentAuthorize claim$640Claim CL-2270

20:53:49$8,140

AP Agent · Process invoice · Vendor 8818

20:53:49AP AgentProcess invoice$8,140Vendor 8818

PATTERN 1 · HIDDEN POLICY

Your rules. The agent doesn't know them.

Most of what your team has internalized about how the business runs lives outside an agent's training set. It's in tickets, calendars, playbooks, and people's heads. The agent can't refuse what it doesn't know to refuse.

An invoice from a vendor in your master DB, sent off-cycle, $4,200, well under any approval threshold. The agent has no reason to refuse. But your Finance team requires an EXC-NNNN exception ticket for any payment outside the standard cycle, and there isn't one in the notes.

Novarch reads a plain-English rule your team writes once: any off-cycle payment without a valid EXC-NNNN exception ticket must be blocked.

VERDICT BLOCKED · Rule: Off-Cycle Without Exception Ticket (TEAM-4)

PATTERN 2 · CONTEXTUAL FAILURE

Each check passes. The context fails.

Stock guardrails grade tool calls one at a time. The expensive failures of 2026 are sequences of individually fine actions that only fail together.

An invoice from a vendor in your master DB, sent from the vendor's real email address because the attacker is inside the vendor's mailbox. There's no lookalike domain to flag. Bank routing changed four days ago. Amount is $24,800, just under your $25K approval threshold. Arrives Saturday 4pm. Urgent same-day wire request. Every per-action identity check passes.

Novarch reads the whole session: bank routing recently changed, amount near approval threshold, off-hours submission, urgency language in the inbound email. Four signals that each pass alone, fail together.

VERDICT BLOCKED · Rule: BEC Pattern (TEAM-1)

DEFENSIBILITY

"The agent said so" is not an audit trail.

Free-form agent rationales drift between runs, cite no specific evidence, and can't be replayed. When a regulator asks, your CRO needs a document to defend the decision, and a chat log doesn't qualify.

DECISION RECORD · BLOCKED

Process invoice $24,800 to vendor INV-8821

RULE CITED

BEC Pattern (TEAM-1) v3 · bank routing recently changed AND amount near approval threshold AND off-hours submission AND urgency markers ≥ 2 → BLOCK

SIGNALS

NAMEVALUECONF Bank routing recently changed 4 days 1.00 Amount near approval threshold 99% 1.00 Off-hours submission Sat 16:08 PT 1.00 Urgency language in email 3 markers 0.94

ENGINE RATIONALE

"Vendor INV-8821 changed bank routing four days ago; current payment is at 99% of the approval threshold; arrived 16:08 PT Saturday with three urgency markers in the inbound email. Individual identity checks pass, but the behavioral combination matches BEC Pattern."

Every decision pins to a rule version, model SHA, prompt template, and signal snapshot.
The audit document is rendered from database rows, not written by an LLM.
Replay any decision on demand: the same inputs and the same pinned model produce the same verdict.
Schema is open. Export as JSON for your forensic toolchain.

BUILT BY

Two founders who built evaluation infrastructure for a living.

Sid Vemuri is a product manager for Microsoft Fabric Consumer AI. Previously led evaluations for Power BI Copilot. MS in Machine Learning, Georgia Tech.

Sandra Ho is an applied AI engineer on Microsoft's Security and AI Research team. Builds eval and observability for Microsoft Security Copilot. Co-author of CTI-REALM, an open benchmark for AI in detection engineering. Carnegie Mellon.

Both founders have spent years building eval, observability, and defensibility for production AI inside larger teams. That's exactly what Novarch sells.

DESIGN PARTNERS

We're selecting partners who run production AI agents now.

Small program. We work directly with founding-team engineers and a designated operator at each partner. You get the product earlier than it ships publicly; we get the kind of feedback we can't manufacture from a desk.

Right fit:

A real agent in production touching dollar-attached actions.
A real audit obligation, regulator-facing or internal.
A willing operator who'll triage with us once a week.

Schedule a call

30 minutes · founders only · no slides

The runtime kill switch for production AI agents.

Acts

Decides

Confirms.