New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes - The Hacker News
The TokenBreak attack exploits specific tokenization strategies (BPE or WordPiece) in text classification models by introducing single-character changes, bypassing AI moderation guardrails. This vulnerability facilitates prompt injection attacks where subtle input modifications enable malicious outputs while remaining comprehensible to the LLM.
Source: Original Report ↗