f(cto) - Representation engineering using control vectors

Representation engineering using control vectors

Complex system prompts are often used to safeguard LLMs

But they also can be subverted 👇

I recently learned about "Representation Engineering" using control vectors. These control vectors can be applied to a model at the time of inference to influence how the model responds to requests.

In a post written by Theia Vogel, she explains how these control vectors could protect against jailbreaking techniques:

"The whole point of a jailbreak is that you're adding more tokens to distract from, invert the effects of, or minimize the troublesome prompt. But a control vector is everywhere, on every token, always."

This technique could result in a less subvertible agent. 👏👏👏

I highly recommend you read Theia's post

Representation engineering using control vectors

Brenden Grace