

Is this necessarily true? I remember seeing an article a while back suggesting that prompting “do not hallucinate” is enough to meaningfully reduce the risk of hallucinations in the output.
From my fairly superficial understanding of how LLMs work, “don’t do X” will plot a completely different vector for the “X” semantic dimension than prompting “do X”. This is different to telling a human, for example, to not think about elephants (congratulations, you’re now thinking about elephants. Aren’t they cute. Look at that little trunk and smiley mouth)


The only comfort I take from your reply is that you lost a little bit before me