In summary, a researcher found a new exploit for large language models that works across flagship models and can override the RLHF safeguards on said models to generate text in a baseline LLM text-predictor fashion.
Marked as NSFW because as an example, the researcher got GPT4 to generate erotica inolving Donald Trump and a pumpkin.
exactly what it was made for, nice.
CW: Trump fucking a pumpkin
Hey, this is from scientific research buddy, from Princeton