Researchers uncover surprising method to hack the guardrails of LLMs
Researchers from Carnegie Mellon University and the Center for A.I. Safety have discovered a new prompt injection method to override the guardrails of large language models (LLMs). These guardrails are safety measures designed to prevent AI from generating harmful content. This discovery poses a significant risk to the deployment of LLMs in public-facing applications, as it could potentially allow these models to be used for malicious purposes.