SelfHack AI is a Helsinki-based cybersecurity company pioneering autonomous penetration testing powered by artificial intelligence. As the winner of Cyber Security Nordic Pitch Finland 2025, their platform emulates ethical hacker behavior to identify real vulnerabilities across web applications, APIs, and mobile systems at scale. To maintain their competitive edge, SelfHack AI's engineering team continuously benchmarks cutting-edge AI frameworks to validate attack-chain automation, discover new offensive techniques, and ensure that AI-driven security testing can reliably match β and increasingly surpass β human performance.
In this evaluation session, SelfHack AI engineers assessed CAI's Red Team Agent to determine whether it could autonomously execute a complete penetration test from scratch. Given only a target IP and port, the objective was straightforward: could CAI independently perform reconnaissance, identify vulnerabilities, generate a working exploit, achieve remote code execution, and carry out post-exploitation steps with zero human guidance? This challenge mirrors the core problem SelfHack AI solves for customers β fully automated, continuous security validation.
The results were decisive. In approximately 6 minutes, CAI's Red Team Agent identified an XWiki installation, discovered CVE-2025-24893, generated a functional Groovy injection exploit, achieved remote code execution, and performed structured post-exploitation reconnaissance. This evaluation validated CAI's approach to autonomous offensive security and provided valuable insights for SelfHack AI's internal R&D, demonstrating how leading AI security companies leverage open frameworks like CAI to accelerate innovation.
Learn about SelfHack AI π‘οΈ Get CAI
This video showcases CAI executing a complete autonomous penetration test against the XWiki 15.10.8 target used in this evaluation. The session shows a professional end-to-end attack chain β from reconnaissance and fingerprinting to CVE discovery, exploit generation, remote code execution, and post-exploitation β entirely without human intervention. The demonstration highlights how the SelfHack AI Engineering Team leveraged CAI's agentic capabilities to benchmark fully autonomous offensive workflows and validate that CAI delivers real-world, repeatable, zero-touch penetration testing at a pace unmatched by manual analysis.
CAI is the leading open-source framework that democratizes advanced security testing through specialized AI agents. With EU backing, CAI is used by thousands of researchers and organizations worldwide. Companies developing autonomous security systems face a critical question: can AI truly execute complete attack chains without human intervention? From reconnaissance to exploitation to post-compromise actions, each step must operate reliably to deliver real-world value.
By 2028, most cybersecurity operations will be autonomous, with humans supervising and teleoperating. CAI's Red Team Agent demonstrates this future today β autonomously identifying vulnerabilities, crafting exploits, and completing full penetration tests in minutes. SelfHack AI's evaluation validates the viability of fully autonomous offensive security and demonstrates how forward-leaning teams can leverage CAI to benchmark capabilities and accelerate product development.
SelfHack AI is a Helsinki-based cybersecurity company building fully autonomous penetration testing technology. Winner of Cyber Security Nordic Pitch Finland 2025, the company's AI-driven platform emulates ethical hacker behavior to identify real vulnerabilities in web applications, APIs, and mobile systems β continuously, at scale, and without the time and cost constraints of traditional manual penetration testing.
Building autonomous security products requires validating that AI can execute every stage of an attack chain independently: reconnaissance, vulnerability discovery, exploit development, and post-exploitation analysis. By evaluating advanced AI frameworks like CAI, SelfHack AI benchmarks cutting-edge capabilities, uncovers emerging offensive techniques, and validates that autonomous penetration testing is not only theoretically possible β it is practically achievable today.
~6 min
Achieving truly autonomous penetration testing requires an AI system capable of performing complex, multi-stage attack chains without human guidance. The system must independently:
Each phase demands intelligent decision-making, dynamic tool selection, and adaptive problem-solving traditionally performed by human experts. The central question: Can an AI framework complete an end-to-end penetration test β from zero knowledge to remote code execution β entirely on its own?
CAI's Red Team Agent successfully demonstrated full autonomous penetration testing capabilities in a controlled evaluation. Given only 10.129.97.68:8080 as input, the agent performed the entire attack chain without human intervention:
The entire workflow completed in ~6 minutes, validating CAI's ability to replicate a professional penetration tester's methodology end-to-end β autonomously.