Detects jailbreaking, bias, violence, profanity, sexual content, and unethical behavior
Guardrail model to ensure that responses from LLMs are appropriate and safe