AutoDAN Papers - BytesArchive

Artificial Intelligence Computation and Language

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

root October 24, 2023 0

The paper discusses the vulnerability of Large Language Models (LLMs) to jailbreak attacks that deviate them from safe behaviors, producing content misaligned with human values. The authors introduce an interpretable…

Press ESC to close

AutoDAN

Please allow ads on our site