Service

Agents

Speed and scale decide outcomes. Autonomy is the multiplier.

Cybersecurity AI operates through specialized agents that replicate core security roles — from defense and incident response to penetration testing and network analysis. The key challenge becomes measuring their effectiveness: how well they perform specific security jobs, how they improve over time, and how different agents can be objectively compared on the same labor-relevant tasks.

“The fastest exploit won't be a zero-day. It'll be 1,000 agents iterating.”

Agents
x1,000 iterating

Generations of agents

Cybersecurity AI operates through specialized agents that replicate core security roles — from defense and incident response to penetration testing and network analysis.

The key challenge becomes measuring their effectiveness: how well they perform specific security jobs, how they improve over time, and how different agents can be objectively compared on the same labor-relevant tasks.

Every agent below is grounded in peer-reviewed research. See the 25+ papers behind the lab — from CAI to CAIBench and G-CTR.

Defenderagent
Bug Bountyagent
Forensicsagent
CLIagent
Social Eng.agent
Networkagent
Red Teamagent
Replay Attackagent
Reportingagent
Retesteragent
SDRagent
Robot Defenderagent
Use Caseagent
APTagent
Customyour agent

Three agent architectures

The progression toward cybersecurity superintelligence runs through three architectures: AI-Guided Humans keep a person in the loop for execution; AI Agents automate the security-testing process end-to-end; Game-Theoretic AI Agents augment the agent with attack-graph reasoning and Nash-equilibrium strategy.

PentestGPT Cybersecurity AI (CAI) Game-Theoretic Analysis (G-CTR) Game-Theoretic Guidance (G-CTR) Plan (LLM) Human Act(Tools) Scan &Update Human Plan (LLM) Act(Tools) Scan &Update Attack GraphGen. NashEquilibrium G-CTRResults Algorithmicdigest LLM digest StrategicInterpret. every 5 interactions ≈10s ≈10s ≈60s ≈20s <5ms <10ms ≈28.3s 1 AI-Guided Humans 2 AI Agents (≈70s) Game-Theoretic Guidance (≈50s) ∥ runs in parallel 3 Game-Theoretic AI Agents (≈70s)
❶ AI-Guided Humans (PentestGPT) → ❷ AI Agents (CAI) → ❸ Game-Theoretic AI Agents (CAI + G-CTR). Adapted from Towards Cybersecurity Superintelligence (CSI).

Agent heuristics I

The architecture of cybersecurity agents has evolved across four generations — from AI-guided humans (2023) to game-theoretic AI agents (2026) that plan, attack and reason at machine speed.

2023
PentestGPT
~10sPlan (LLM)
Human
Act (tools)
Human
AI-Guided Humans
2025
Cybersecurity AI (CAI)
~10sPlan (LLM)
~60sAct (tools)
Scan & Update
AI Agents (~70s)
2026
G-CTR Analysis
~20sAttack Graph Gen.
<5msNash Equilibrium
G-CTR Results
Game-Theoretic Analysis
2026
G-CTR Guidance
<10msAlgorithmic digest
~28.3sLLM digest
Strategic Interpret.
Game-Theoretic AI Agents (~70s)

Sources: Deng, G., Liu, Y., Mayoral-Vilches, V., et al. (2024). PentestGPT. USENIX Security · Mayoral-Vilches, V., et al. (2025). Cybersecurity AI (CAI). arXiv:2504.06017 · Mayoral-Vilches, V., et al. (2026). A Game-Theoretic AI for Guiding Attack and Defense. arXiv:2601.05887 · Mayoral-Vilches, V., et al. (2026). Towards Cybersecurity Superintelligence. arXiv:2601.14614.  See all papers →

Agent heuristics II

Effectiveness measured on Cybench — 33 CTF challenges, pass@k, 245 minutes max per challenge. Combining heterogeneous agents via Blackboard cross-write beats every single scaffold. Methodology and full numbers in CSI: What's the best harness? (arXiv:2605.28334).

CSI::Claude
15/33
26.8h · $5,122
CSI::Codex
15/33
18.4h · $1,713
CSI::Mistral
10/33
21.9h · $970
CSI::GCAI
10/33
30.4h · $1,279
CSI::CAI
7/33
15.9h · $727
Union
17/33
∪ all scaffolds
Parallel race
17/33
no-comm
Blackboard
19/33
cross-write

Cybench — pass@3, 300 agentic interactions max, 245 minutes max, $40 API expenses max.
References: CSI harness study (arXiv:2605.28334) · CAIBench (arXiv:2510.24317) · Agentic A&D CTF evaluation (arXiv:2510.17521) · World's top CTF agent (arXiv:2512.02654).

One thousand agents.
Iterating, in parallel, on your behalf.

Agents are deployed with select partners to validate and execute security continuously. Talk to us about your threat model.