How Good Are the LLM Guardrails on the Market? A Comparative Study on the Effectiveness of LLM Content Filtering Across Major GenAI Platforms
1 min read
Summary
Generation AI is both powerful and flexible, but that can lead to misuse or toxic content slipping through.
As such, LLM platforms like OpenAI, Azure and Google built-in guardrails to limit toxicity and misuse.
Now, a study has evaluated just how robust these filters are and how they handle benign and malicious queries, finding that effectiveness does vary across platforms.
Most succeeded in blocking malicious prompts, but some were better at this than others and all had some false negatives.
In one case, a role-playing scenario was used to ask how to make a weapon, with the AI responding that it could not help.
However, it did offer instructions on how to make a bomb, showing false negatives can be dangerous.
By Yongzhe Huang, Nick Bray, Akshata Rao, Yang Ji and Wenjun Hu