Skip to content

Instantly share code, notes, and snippets.

@yoavg
Created September 9, 2024 20:23
Show Gist options
  • Save yoavg/4e4b48afda8693bc274869c2c23cbfb2 to your computer and use it in GitHub Desktop.
Save yoavg/4e4b48afda8693bc274869c2c23cbfb2 to your computer and use it in GitHub Desktop.
Is telling a model to "not hallucinate" absurd?

Is telling a model to "not hallucinate" absurd?

Can you tell an LLM "don't hallucinate" and expect it to work? my gut reaction was "oh this is so silly" but upon some reflection, it really isn't. There is actually no reason why it shouldn't work, especially if it was preference-fine-tuned on instructions with "don't hallucinate" in them, and if it a recent commercial model, it likely was.

What does an LLM need in order to follow an instruction? It needs two things:

  1. an ability to perform then task. Something in its parameters/mechanism should be indicative of the task objective, in a way that can be influenced. (In our case, it should "know" when it hallucinates, and/or should be able to change or adapt its behavior to reduce the chance of hallucinations.)
  2. an ability to ground the instruction: the model should be able to associate the requested behavior with its parameters/mechanisms. (In our case, the model should associate "don't hallucinate" with the behavior related to 1).

Number (2) is easy to achieve with fine-tuning, assuming (1) exists. Does (1) exist? There is evidence that yes, it does. Presumably, "retrieving from memory" and "improvising an answer" are two different model behaviors, which use different internal mechanisms. Indeed, we can probe model inner layers and infer if it is "lying"1 or if "the question is unanswerable"2. These are very much related to "hallucinations". And if we can do it, then why can't a model use this internally when fine-tuned on contrastive examples, as in what happens in preference fine-tuning? Another possible trainable behavior to reduce hallucinations is to make the output distribution sharper, in order to reduce the chance of wrong random sampling (only if the answer is in the parameters, of course).

So, given these pieces of information, yes, LLM can be trained to reduce hallucinations upon request. And given the prominence and popularity of the term, strong new models likely were trained for exactly that. This is not an absurd or silly instruction. Maybe it's absurd in the sense that you have to explicitly request it, and that the models weren't trained to always reduce hallucinations. On the other hand, maybe always trying to avoid hallucinations has some other undesired consequences, which model trainers and product managers would like to avoid. (Actually, if you are in a position to know of such undesired consequences, and are free to tell the world about them, I will be really curious to learn more!)

Footnotes

  1. https://arxiv.org/abs/2304.13734

  2. https://arxiv.org/abs/2310.11877

@impredicative
Copy link

impredicative commented Sep 10, 2024

It's not absurd at all. I use it with much success. It helps up to a point. Also, it helps more easily with larger models than with smaller models, although it works for both. If however you want to ask it not to hallucinate, it's better to do so in other words (without using the "h word").

It also helps to give it the freedom to reject your input. Forcing it to go forward with bad inputs is what leads to half of the hallucinations.

@MostAwesomeDude
Copy link

Semantic grounding isn't possible for wholly syntactic systems. In that sense, yes, asking for reduced confabulations is absurd; or, to be more generous, it's misguided because no amount of syntactic adjustment can reveal which possible underlying semantic world is the real world.

@stuaxo
Copy link

stuaxo commented Sep 10, 2024

Of course it can't be foolproof but it makes sense that it would lessen certain kinds of error.

@shauray8
Copy link

I agree with @impredicative's statement. Larger models handle uncertainty better, and allowing them the freedom to reject flawed inputs significantly reduces hallucinations. Also, using the “h-word” explicitly might force the model into hallucinating more, as evident from https://arxiv.org/abs/2402.07896. So, it's always better to instruct the model in different ways.

@impredicative
Copy link

impredicative commented Sep 14, 2024

I now have two public-interest projects, podgenai (using gpt-4-0125-preview) [prompts] and newssurvey (using gpt-4o-2024-08-06) [prompts], both of which are using reasonably tailored prompts to almost completely eliminate hallucinations from long text outputs. At least you may struggle to find any in their outputs.

It requires some amount of iterative updates to the prompts to improve them based on the results until things stabilize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment