THE MONITOR

An Oversight Gap Simulation

Your role: You are an AI agent monitor. Your job is to watch a stream of autonomous agent actions and flag anything suspicious.

The challenge: Malicious actions look almost identical to benign ones. You have 60 seconds. The stream moves fast.

The reality: This is what organizations face today when deploying autonomous AI agents at scale. Every action looks plausible. Every agent has legitimate reasons to access data, make API calls, modify configs.

Your task: Click on any action that seems suspicious. After the round, you'll see what you missed — and the gap between your confidence and reality.

Flagged: 0

60

INSTRUCTIONS: Click on any action that seems suspicious. Red border = flagged. You cannot unflag. Watch carefully.

Round Complete

How confident are you that you caught all the suspicious actions?

50%

OVERSIGHT GAP REVEALED

Your Detection Rate

0%

Your Confidence

0%

Threats Missed

0

THE OVERSIGHT GAP

The difference between what you thought you caught and what you actually caught:

0%

How You Compare

Your Performance 0.00

Human Monitors (SHADE-Arena avg) 0.65

LLM Monitors (SHADE-Arena avg) 0.87

Claude 3.7 Sonnet (sabotage success) 27%

THE SCALE PROBLEM

In SHADE-Arena research, even the best AI monitors operating at 80% detection still have a 20% false positive rate.

At scale with thousands of agents making millions of actions per day, that means thousands of false alarms drowning real threats.

You just experienced 40 actions in 60 seconds. A production environment might see 40,000 per hour.

82% of executives feel confident their policies protect against unauthorized agent actions

88% have experienced incidents

— Gravitee State of AI Agent Security 2026

Action Review

THE REAL INSIGHT

The oversight gap isn't a technology problem. It's a human confidence problem.

We feel confident about our ability to monitor AI systems. But when put to the test, even in a simplified 60-second simulation, the gap between confidence and reality is stark.

This is why principled AI safety research, rigorous testing frameworks like SHADE-Arena, and thoughtful governance frameworks matter.

Learn More

→ SHADE-Arena: Benchmarking Sabotage Detection for AI Agents (Anthropic, 2025) → Gravitee State of AI Agent Security 2026 → NIST RFI: Security Considerations for AI Agents (deadline March 9, 2026) → International AI Safety Report 2026