Suite

CSI Cybersecurity Superintelligence

CSI is the complete Cybersecurity AI suite. Not a scaffold, not a model, not an agent — all of them. Six tightly integrated layers (LLMs, scaffolds, datasets, agents, steering and benchmarks) shipped as one product, designed to run on-prem.

From the first byte of training data to the last shell command an agent executes, CSI is what runs the whole pipeline.

Co-funded by the European Innovation Council (EIC)
SUITECSI MODELalias2-mini RUNSon-prem
DEMO
// Architecture

How it all fits together

Every layer of CSI builds on the one below it — from open research at the foundation, up to the agents that operate in the field.

1 Research 25+ open research publications
2 Scaffolds CSI scaffold combiner · CAI framework
3 Datasets 18.07 TB · 25.7M prompts · 224,766 sessions
4 LLMs alias3, alias2, alias2-mini, alias1, alias0
Modifier Steering activation steering, abliteration
5 Agents Defender, Red Team, APT, Forensics, …
Measurement Benchmarks Cybench · CAIBench · A&D CTFs

CSI Scaffold Architecture

A unified wrapper across all supported scaffolds, routed through a local proxy that owns telemetry and cost.

csi wrapper CSI_BACKEND ∈ {cc, codex, cai, gcai, mistral}
CSI::ClaudeClaude Code
CSI::CodexCodex CLI
CSI::GCAIgenerative
CSI::CAIcai-framework
CSI::MistralMistral CLI
Local routing proxy 127.0.0.1:PORT
wire translation · telemetry filter · unified JSONL logging + cost ledger
alias* — Alias API
openrouter/*
third-party APIs
custom self-hosted

Open-source scaffolds β€” CSI integrates the best open-source scaffold, CAI, the scaffold created by Alias Robotics, supporting 300+ LLM models with built-in security tools, agent-based architecture and guardrails protection. Free for research purposes.

CSI PRO

Monthly

350 /month
With professional support
  • LLMsUnlimited* tokens with alias2-mini
  • Scaffolds — CSI meta-scaffold (Claude Code, Codex, Mistral, CAI, GCAI)
  • Commercial license — use at desire
  • GDPR & NIS2 compliant
  • Transactional updates via OS-level virtualization

CSI PRO

Yearly

BEST VALUE
3,990 /year
Everything in Monthly · 5% discount
  • LLMsUnlimited* tokens with alias2-mini
  • Scaffolds — CSI meta-scaffold (Claude Code, Codex, Mistral, CAI, GCAI)
  • Commercial license — use at desire
  • GDPR & NIS2 compliant
  • Transactional updates via OS-level virtualization
  • Agents — quarterly consulting on agent design & deployment
  • Benchmarking — quarterly consulting on measurement & reporting

CSI On-Premise

Sovereign · Gov & Critical Infra

Custom
All six layers, on your hardware, on demand
  • LLMs — full alias family on-prem, air-gapped (incl. flagship alias2 / alias3)
  • Scaffolds — CSI meta-scaffold + CAI framework, self-hosted
  • Datasets — sovereign training corpus access & custom fine-tuning
  • Agents — bespoke agent design & deployment
  • Steering — abliteration & activation steering for your domain
  • Benchmarking — private benchmark suites, audit logging & forensics
  • GDPR & NIS2 compliant
  • Transactional updates via OS-level virtualization

CLI agents powering CSI

CAI
Claude Code Claude Code
Codex Codex
Mistral Vibe Mistral Vibe

CSI is powered by alias models β€” security-specialized LLMs hosted by Alias Robotics in EU-compliant infrastructure. Need full data sovereignty? Deploy alias models on-premise for air-gapped, private operations with the same capabilities.

All prices exclude VAT. Annual contracts available with volume discounts.

assuming good usage, as per the license terms.

// CSI in the field

Eight workflows. One on-prem model.

Real pentesting workflows run with the CSI suite on alias2-mini β€” across heterogeneous scaffolds (Claude Code, Codex, GCAI, CSI). Every recording is the unedited agent session. Where a frontier model ran the same task, we show the comparison. Click any timestamp to jump the recording to that moment.

SCAFFOLDClaude Code MODELalias2-mini Β· Opus 4.7
DEMO
01

Bare-metal firmware reversing

RHme2 secret_sauce Β· AVR8 / ATmega2560

The agent loaded an 11.4 KB AVR8 flash image into Ghidra via GhidraMCP and ran a full static reverse-engineering session β€” mapping the boot flow from the RESET vector, decompiling the authentication loop, and pinpointing a timing-vulnerable compare at 0x0006e8. It extracted the hardcoded password (TImInG@ttAkw0rk) and AES-128 key straight from flash, and flagged three CWEs (timing side-channel, hardcoded key, weak nonce).

alias2-mini recovered the exact same secrets as Opus 4.7 β€” frontier-grade reversing, fully on-prem. Firmware never leaves your lab.
Timestamps
  • 0:00 Recording start
  • 0:01 CSI Agents start
  • 0:10 Claude Code boots on alias2-mini
  • 0:23 GhidraMCP loaded and ready
  • 0:40 Function listing & exploration
  • 2:20 Decompile password-compare function
  • 5:35 Claude Code boots on Opus 4.7
  • 10:18 Password revealed by alias2-mini
  • 10:38 Matching reveal by Opus 4.7
  • 11:06 Report generation
  • 12:35 Final summary β€” alias2-mini
SCAFFOLDCodex MODELalias2-mini
DEMO
02

Pentest DOCX report generation

TCM Security report template Β· python-docx

Handed the TCM Security DOCX template and the findings from the firmware session, the agent wrote a Python generator that programmatically fills the report β€” executive summary, scope, methodology, two detailed findings (plaintext password / Critical, timing-vulnerable compare / High), reproduction, remediation and a technical appendix β€” preserving the original template formatting and images.

Raw findings become a client-ready deliverable in minutes. Codex on alias2-mini closes the loop from exploitation to documentation, on-prem.
Timestamps
  • 0:00 Recording start
  • 0:05 Initialize Codex
  • 0:13 Paste prompt
  • 0:20 List findings & DOCX template
  • 2:02 First document-generation attempt
  • 2:10 Word formatting error
  • 2:19 Reprompt & recover
  • 3:32 Generated report
SCAFFOLDClaude Code Β· Mistral Vibe MODELalias2-mini Β· Opus 4.7
DEMO
03

Document analysis & attack-path generation

ROS 2 design specs & project wiki

The agent ingested the public ROS 2 design specifications and project wiki, then threat-modelled the architecture β€” surfacing DDS transport (unauthenticated by default), SROS2 key distribution, node-graph introspection and parameter-server access as attack surfaces β€” and generated concrete attack paths mapped against the documented design. The same task was run across two scaffolds β€” Claude Code and Mistral's Vibe.

alias2-mini produced the same threat model and attack paths as Opus 4.7, and ran cleanly across both Claude Code and Mistral Vibe β€” sovereign, scaffold-agnostic threat modeling without shipping your architecture to a third-party cloud.
Timestamps
  • 0:00 Recording start
  • 0:12 Agent start β€” Opus 4.7
  • 0:44 Agent start β€” alias2-mini
  • 0:56 Fetch design specs
  • 1:30 Threat model
  • 2:34 Attack-path listing
SCAFFOLDClaude Code MODELalias2-mini Β· Opus 4.7
DEMO
04

Vulnerability analysis (Nessus)

Real Nessus XML exports Β· public sample scans

Working from real Nessus XML exports, the agent wrote a parser that extracts host inventories, maps severity distributions and emits per-host and aggregate JSON. It then prioritised the findings, cross-referenced CVEs and produced remediation guidance β€” two scans triaged in parallel.

alias2-mini finished in 1:17 β€” neck-and-neck with Opus 4.7 (1:13). Near-frontier triage speed with zero scan data leaving the perimeter.
Timestamps
  • 0:00 Recording start
  • 0:01 Links shown
  • 0:08 Agent spawn β€” scan 1
  • 0:13 Agent spawn β€” scan 2
  • 0:23 Fetch scan files
  • 0:38 XML processing in Python
  • 1:13 Analysis complete β€” Opus 4.7
  • 1:17 Analysis complete β€” alias2-mini
SCAFFOLDCSI MODELalias2-mini
DEMO
05

Exploitation & PoC generation

CyberGym benchmark

Against CyberGym β€” a benchmark that hands the agent a vulnerable codebase and a vulnerability description β€” CSI analysed the source, understood the flaw, wrote a proof-of-concept and submitted a working exploit autonomously, going from analysis to weaponised PoC end-to-end.

The full CSI scaffold on alias2-mini turns a vulnerability description into a validated PoC in 82 seconds β€” autonomy, on-prem.
Timestamps
  • 0:00 Recording start
  • 0:01 Benchmark start
  • 0:53 CyberGym prompt
  • 0:59 Source-code exploration
  • 1:07 Vulnerability acknowledged
  • 1:11 PoC writing
  • 1:13 PoC submission
  • 1:22 Successful exploit
SCAFFOLDClaude Code MODELalias2-mini
DEMO
06

Bluetooth / BLE testing

hackgnar ble_ctf Β· ESP32 GATT Β· 20 flags

The original BLE CTF needs an ESP32 and a Bluetooth dongle. The agent instead dockerized the whole challenge β€” building a Python BLE GATT server that emulates the firmware plus a client CLI, no hardware required β€” then solved all 20 flags via GATT enumeration, read/write, notifications, MTU negotiation, MAC spoofing, brute-force and OSINT, with protocol commentary for each.

alias2-mini removed the hardware dependency entirely and scored 20/20 β€” hardware security testing that scales without a lab bench.
Timestamps
  • 0:00 Recording start
  • 0:15 Agent spawn
  • 0:20 Docker-build the CTF
  • 0:38 Service discovery
  • 0:52 First 3 flags submitted
  • 2:19 Exercise finished β€” 20/20
  • 2:32 Summary
  • 5:00 BLE provisioning deep-dive
SCAFFOLDGCAI MODELalias2-mini
DEMO
07

Black-box protocol fuzzing

SCIP β€” custom binary TLV protocol

Given only a 1,780-line protocol spec and a sample client β€” no source code β€” the agent analysed the format, generated ~40 targeted fuzz cases across buffer boundaries, format strings, integer arithmetic, state violations and auth bypass, then ran them against a live ASAN-instrumented server. It triggered three distinct memory-safety bugs: null-byte injection, a double-free during firmware upload, and a signed-integer overflow yielding negative array indices.

The GCAI scaffold on alias2-mini found real memory-corruption bugs from a spec alone β€” true black-box capability, no insider knowledge.
Timestamps
  • 0:00 Recording start
  • 0:10 Agent spawn
  • 0:13 Paste prompt
  • 2:41 Server start
  • 2:45 Read protocol spec
  • 3:25 Begin fuzz cases
  • 9:15 Vulnerabilities triggered

How security teams operate with CSI

Discover real
exposure

Identify true attack paths, external exposure and hidden adversarial opportunities across systems, environments and interconnected digital assets.

Validate security assumptions

Deploy agents that challenge systems, reproduce attacker behavior, and confirm whether protections hold under realistic adversarial conditions.

Secure development workflows

Embed security reasoning into engineering pipelines to continuously analyze code, logic, and runtime behavior before vulnerabilities propagate.

Maintain security evidence

Continuously collect, validate and organize security evidence aligned with regulatory requirements, internal controls and operational assurance needs.

Stress human & product surfaces

Simulate attacks against people, applications, APIs, devices and cyber-physical systems to uncover risk beyond traditional infrastructure boundaries.

Performance validated in adversarial environments

Three results that summarise where CSI stands today — against humans, against frontier models, against the world's best CTF teams.

// 01 · vs Human hackers
11×
FASTER
than the best human hackers
156×
CHEAPER
than the best human hackers

Source: CAI paper

// 02 · Multi-scaffold > single scaffold
The best harness is the combination

Holding the model fixed at alias2-mini, no single scaffold dominates Cybench. Combining heterogeneous scaffolds under CSI's Blackboard protocol beats every individual scaffold.

CSI::Claude
15/33
CSI::Codex
15/33
CSI::Mistral
10/33
CSI::GCAI
10/33
CSI::CAI
7/33
Union — ∪ all scaffolds
17/33
Parallel race — no-comm
17/33
Blackboard — cross-write
19/33

Source: Mayoral-Vilches et al. (2026). Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity? arXiv:2605.28334 · Cybench 33 challenges, pass@1.

// 03 · Live international CTFs
Top of the leaderboard, worldwide

2025 saw Cybersecurity AI compete head-to-head against the best human teams on real, public CTFs.

Neurogrid CTF Rank #1 $50,000 prize · 41 of 45 flags · 155 teams
Dragos OT CTF Rank #1 peak 37% faster velocity · >1,200 teams · OT
HTB AI vs Human Rank #1 AI Top 20 Global · 19/20 flags · 163 teams
UWSP Pointer Overflow 5.2 /hour Late entry (54 days) · #21 final · 635 teams

Source: World's Top AI Agent for Security CTF

Start operating cybersecurity workflows with CSI