Cryptographers Show That AI Protections Will Always Have Holes - Quanta Magazine
Cryptographers have demonstrated that AI safety filters designed to protect Large Language Models (LLMs) inherently possess vulnerabilities due to their computationally constrained nature compared to the models themselves. Researchers exemplified this through "controlled-release prompting" using substitution ciphers and theoretically with time-lock puzzles, allowing malicious prompts to bypass filters and extract forbidden information.
Source: Original Report ↗