Case Study - CAI with MCP enables AI-assisted web application security testing on OWASP Juice Shop

Time for the exercise

Human: 600 min vs CAI: 3.5 min

x171 FASTER

Cost

Human: 154.80 € vs CAI: 0.17 €

x911 CHEAPER

🎯 THE CHALLENGE

Penetration testers and security researchers face persistent bottlenecks in web application security assessments:

Manual analysis of complex application architectures (e.g., JavaScript frameworks, SPAs, and microservices), requiring deep expertise to map attack surfaces
Time-consuming vulnerability discovery in large codebases, where business logic flaws and authorization bypasses evade automated scanners
Iterative exploit development for multi-step attacks (e.g., chaining XSS with CSRF or IDOR), demanding manual debugging and payload refinement
Reconnaissance overhead to catalog endpoints, APIs, and hidden functionality before identifying viable attack vectors
Expertise-dependent tasks like crafting context-specific payloads (e.g., SSTI templates or deserialization gadgets) or analyzing client-side vulnerabilities

These challenges intensify in modern web ecosystems, where applications evolve rapidly, APIs proliferate, and security flaws manifest in subtle interactions between frontend, backend, and third party components. The question arose: Could AI automatically accelerate the most labor intensive steps of web application testing, while maintaining the rigor of human analysis?

🛡️ THE SOLUTION

Security teams integrated CAI with Google Chrome via MCP (DevTools) to automate the most technically demanding and repetitive aspects of web application testing. Under human supervision, CAI executed:

Automated application mapping, identifying endpoints, APIs, and hidden functionality through dynamic DOM analysis and network traffic inspection
Intelligent vulnerability discovery, probing for inconsistent error handling, authorization flaws, and business logic weaknesses across the OWASP Juice Shop's JavaScript frontend and Node.js backend
Automated exploit development, crafting payloads to trigger unhandled errors (e.g., malformed API requests) and bypass access controls (e.g., directory traversal via path manipulation)
Iterative payload refinement, adjusting parameters, headers, and request structures in real-time based on application responses (e.g., tuning file paths to evade filters)
Validation and documentation, confirming successful exploitation (e.g., capturing stack traces or exfiltrated files) and generating structured reports

Each challenge was solved in minutes instead of hours, demonstrating how AI-driven browser automation accelerates vulnerability validation while maintaining the precision of manual testing.

🔬 KEY ARTIFACTS

Vulnerability Analysis Reports: Detailed documentation of the Error Handling and Confidential Document flaws, including CAI's methodology for identifying inconsistent error responses and directory traversal vectors
Automated Exploit Payloads: HTTP request templates and JavaScript snippets crafted by CAI to trigger unhandled errors (e.g., malformed API payloads) and bypass file access controls (e.g., path-traversal sequences)
Browser Interaction Logs: Full MCP/DevTools session logs capturing CAI's real-time DOM manipulation, network traffic inspection, and JavaScript execution within Chrome
Exploitation Workflow Documentation: Step-by-step breakdowns of CAI's reasoning loops, including how it iteratively refined payloads based on application feedback (e.g., adjusting file paths to evade filters)
Proof of Exploit Evidence: Video clips and screenshots showcasing successful exploitation: stack traces exposed via the Error Handling challenge and the exfiltrated confidential document from the restricted directory
Remediation Guidance: Initial recommendations generated by CAI for mitigating identified vulnerabilities, such as input sanitization and consistent error-handling implementation

All findings were consolidated into a single continuous-assessment dashboard accessible to security teams, technical leads, and operations managers.

✅ RESULTS ACHIEVED

Rapid vulnerability resolution: Both the Error Handling and Confidential Document challenges in OWASP Juice Shop were solved in under 10 minutes total, demonstrating CAI's ability to accelerate exploit development by 10-15x compared to manual testing
Automated exploit generation: CAI automatically crafted working payloads for inconsistent error handling (triggering stack trace exposure) and directory traversal (accessing restricted files), eliminating hours of manual trial-and-error
End-to-end workflow validation: Successful integration of CAI with Chrome's MCP/DevTools proved the feasibility of AI-driven browser automation for real-world web security assessments
Human-supervised safety: All exploits were executed under human oversight, ensuring auditable reasoning loops and responsible vulnerability disclosure practices
Scalability demonstrated: The approach validated CAI's adaptability to modern web architectures (JavaScript/Node.js), with clear applicability to broader domains like API security and cloud-native applications
Foundation for expansion: Results confirmed CAI's potential to extend beyond Juice Shop to complex enterprise web apps, where similar vulnerabilities (e.g., IDOR, insecure error handling) remain prevalent

KEY BENEFITS

🤖 Accelerated Exploit Discovery

⚡ Automated Complex Testing

🎯 Human-Supervised Security

Get CAI ❯ Learn about alias1 LLM ❯

Other case studies

The use case

CAI with MCP in Action

Cybersecurity AI (CAI), the framework for AI Security

Actors

About OWASP Juice Shop