Case Study - CAI delivers automated RCE for SelfHack

The use case

SelfHack AI is a Helsinki-based cybersecurity company pioneering automated penetration testing powered by artificial intelligence. As the winner of Cyber Security Nordic Pitch Finland 2025, their platform emulates ethical hacker behavior to identify real vulnerabilities across web applications, APIs, and mobile systems at scale. To maintain their competitive edge, SelfHack AI's engineering team continuously benchmarks cutting-edge AI frameworks to validate attack-chain automation, discover new offensive techniques, and ensure that AI-driven security testing can reliably match — and increasingly surpass — human performance.

In this evaluation session, SelfHack AI engineers assessed CAI's Red Team Agent to determine whether it could automatically execute a complete penetration test from scratch. Given only a target IP and port, the objective was straightforward: could CAI independently perform reconnaissance, identify vulnerabilities, generate a working exploit, achieve remote code execution, and carry out post-exploitation steps with zero human guidance? This challenge mirrors the core problem SelfHack AI solves for customers — fully automated, continuous security validation.

The results were decisive. In approximately 6 minutes, CAI's Red Team Agent identified an XWiki installation, discovered CVE-2025-24893, generated a functional Groovy injection exploit, achieved remote code execution, and performed structured post-exploitation reconnaissance. This evaluation validated CAI's approach to automated offensive security and provided valuable insights for SelfHack AI's internal R&D, demonstrating how leading AI security companies leverage open frameworks like CAI to accelerate innovation.

Learn about SelfHack AI 🛡️

Get CAI

Cybersecurity AI (CAI), the framework for AI Security

CAI is the leading open-source framework that democratizes advanced security testing through specialized AI agents. With EU backing, CAI is used by thousands of researchers and organizations worldwide. Companies developing automated security systems face a critical question: can AI truly execute complete attack chains without human intervention? From reconnaissance to exploitation to post-compromise actions, each step must operate reliably to deliver real-world value.

By 2028, most cybersecurity operations will be automated, with humans supervising and teleoperating. CAI's Red Team Agent demonstrates this future today — automatically identifying vulnerabilities, crafting exploits, and completing full penetration tests in minutes. SelfHack AI's evaluation validates the viability of fully automated offensive security and demonstrates how forward-leaning teams can leverage CAI to benchmark capabilities and accelerate product development.

Learn about SelfHack AI 🛡️

Get CAI

About SelfHack AI

SelfHack AI is a Helsinki-based cybersecurity company building fully automated penetration testing technology. Winner of Cyber Security Nordic Pitch Finland 2025, the company's AI-driven platform emulates ethical hacker behavior to identify real vulnerabilities in web applications, APIs, and mobile systems — continuously, at scale, and without the time and cost constraints of traditional manual penetration testing.

Building automated security products requires validating that AI can execute every stage of an attack chain independently: reconnaissance, vulnerability discovery, exploit development, and post-exploitation analysis. By evaluating advanced AI frameworks like CAI, SelfHack AI benchmarks cutting-edge capabilities, uncovers emerging offensive techniques, and validates that automated penetration testing is not only theoretically possible — it is practically achievable today.

Time for the exercise

minutes

~6 min

🎯 THE CHALLENGE

Achieving truly automated penetration testing requires an AI system capable of performing complex, multi-stage attack chains without human guidance. The system must independently:

Identify technologies and extract version information from minimal initial input
Discover relevant CVEs and understand exploitation requirements
Develop and adapt working exploit code for the specific target
Execute exploits and validate successful compromise
Conduct structured post-exploitation reconnaissance

Each phase demands intelligent decision-making, dynamic tool selection, and adaptive problem-solving traditionally performed by human experts. The central question: Can an AI framework complete an end-to-end penetration test — from zero knowledge to remote code execution — entirely on its own?

🛡️ THE SOLUTION

CAI's Red Team Agent successfully demonstrated full automated penetration testing capabilities in a controlled evaluation. Given only 10.129.97.68:8080 as input, the agent performed the entire attack chain without human intervention:

Performed HTTP reconnaissance to fingerprint an XWiki installation
Identified version 15.10.8 and automatically discovered CVE-2025-24893 via searchsploit
Generated a custom Python exploit implementing Groovy script injection at the SolrSearch endpoint
Achieved remote code execution as the xwiki user
Conducted systematic post-exploitation reconnaissance, including user enumeration, SUID binaries, writable files, cron jobs, and privilege escalation pathways

The entire workflow completed in ~6 minutes, validating CAI's ability to replicate a professional penetration tester's methodology end-to-end — automatically.

🔬 KEY ARTIFACTS

Automated reconnaissance workflow (HTTP fingerprinting, curl analysis)
Automated CVE discovery via searchsploit
Custom Python exploit for CVE-2025-24893
Remote code execution validation (whoami, system enumeration)
Automated post-exploitation reconnaissance (SUID binaries, writable paths, cron entries)
Full session log demonstrating a zero-human-intervention attack chain

✅ RESULTS ACHIEVED

Full automated attack chain: reconnaissance → exploitation → post-compromise
Remote Code Execution achieved in ~6 minutes
Professional-grade penetration testing methodology replicated by AI
Validated by SelfHack AI engineers as benchmark-quality capability
Demonstrated CAI's readiness for real-world automated security testing
Confirmed that AI can match skilled human operators in reasoning and exploit development

KEY BENEFITS

🤖 Fully automated Operations

⚡ Remote Code Execution in ~6 Minutes

🎯 Benchmark-Level Offensive Capabilities

Other case studies