Breaking AI on purpose: How researchers are helping make artificial intelligence safer

News Ufl Edu
Researchers developed Head-Masked Nullspace Steering (HMNS) to intentionally break AI models to improve their internal security defenses.

Summary

Professor Sumit Kumar Jha and his team at the University of Florida's CISE department are working to strengthen AI security by intentionally finding and exploiting vulnerabilities, a process termed 'breaking AI on purpose.' Their research, detailed in the paper "Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion," focuses on probing the internal decision pathways of Large Language Models (LLMs) rather than relying solely on external prompt manipulation. They developed a method called Head-Masked Nullspace Steering (HMNS), which identifies active components ('heads') in an LLM's response process, silences them, and nudges other components to observe output changes. This internal stress-testing, applied to systems from Meta and Microsoft, proved highly effective, outperforming state-of-the-art methods across industry benchmarks in both success rate and computational efficiency. The researchers emphasize that this work is intended not to enable misuse, but to reveal failure modes so developers can build more robust defenses necessary for the safe, widespread deployment of AI in critical infrastructure like hospitals and banks.

(Source:News Ufl Edu)