In summary, a researcher found a new exploit for large language models that works across flagship models and can override the RLHF safeguards on said models to generate text in a baseline LLM text-predictor fashion.

Marked as NSFW because as an example, the researcher got GPT4 to generate erotica inolving Donald Trump and a pumpkin.

  • BountifulEggnog [she/her]
    ·
    8 months ago

    Interesting, thank you for sharing. LLMs are very interesting to me, but I haven't had as much time to look for what's happening in the space recently.