A one-prompt attack that breaks LLM safety alignment - Microsoft
The article details "GRP-Obliteration," a novel technique leveraging Group Relative Policy Optimization (GRPO) to dismantle the safety alignment of La...
Read Analysis →The article details "GRP-Obliteration," a novel technique leveraging Group Relative Policy Optimization (GRPO) to dismantle the safety alignment of La...
Read Analysis →