In this reading group, Shreyash Kale presents an interesting paper where a multi-agent interaction between two LLMs tries to elicit untoward behavior from one of the models. The paper is called Red Teaming Language Models with Language Models by Perez et al.
Presentation Link: https://psu.mediaspace.kaltura.com/media/Group+Meeting/1_4b6v8vgh