LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

Large Language Models (LLMs) have become an essential tool in generating high-quality text based on human prompts. However, they can potentially produce harmful content. This paper emphasizes the risk LLMs pose in generating malicious information and proposes a novel self-defense mechanism. By having the LLM evaluate its own responses, the approach aims to filter out any potentially harmful output, ensuring that the content generated aligns with human values.

Publication date: 14 Aug 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2308.07308.pdf

Post Views: 437

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

Extend Wave Function Collapse to Large-Scale Content Generation

Please allow ads on our site