10 Feb 2023
All Models Hallucinate, Some are Useful
TLDRUpFront: George Box’s quote on the problems, and utility of models, applies as much to AI as anything else since he said it.
FullContextInTheBack: In a recent article in the New Yorker, a useful analogy was provided:
“Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry JPEG, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand ChatGPT’s facility at repackaging information found on the Web by using different words. It’s also a way to understand the “hallucinations,” or nonsensical answers to factual questions, to which large-language models such as ChatGPT are all too prone. These hallucinations are compression artifacts – they are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world. When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.”
This isn’t just a useful way to understand ChatGPT but models of *all* varieties. The reason everyone quotes Box “all models are wrong, some are useful” is that understanding that whenever we try to represent the complexity of the real world, we lose some of the detail. No matter how hard we try, every model of the real world – whether it’s a computer simulation, AI LLM, or even a “mental model” in holds this loss-of-detail compression problem, but it’s also what makes models useful.
I just finished a wonderful conversation with three simulation scientists grappling with the nature of a “metaphor” *AS* a simulated model of the real world. It may be analog, and qualitative, but it is an incredibly powerful tool for abstracting the fine-grained detailed, and messy complexity of the world into simplified vividly visual, and emotive concepts to facilitate understanding.
An example of a “hallucination” compression model frequently heard as metaphor:
“Everyone wants to be a lion until it’s time to do lion things.”
^^ This statement, as often presented, has much less to do with the factually discernible facts of being a lion, and more of a ‘model’ of behavior we either want to ascribe to ourselves or use to forecast the behaviors of others based on the things we (mistakenly) think lions do. Biologically speaking, lions are shiftless lie-about while lionesses do the heavy lifting. Those are the “factual” aspects of lionhood.
But the metaphor *IS* a model. It has a utility. It describes a phenomenon in many ways, the most simple of which is people saying one thing but when called to back words with actions doing another.
The metaphor has nothing to do with lions and everything to do with humans. The “lion” in the metaphor is a “blurry JPG”, visually explicit and vivid imagery that doesn’t hold up to scrutiny when we zoom in on the details. The abstracted “lion” serves us better than saying “Everyone wants to be like Alexander until it’s time to do Alexander things” …which will just promote head scratches until they realize someone is talking about Alexander the Great, at which point an argument on detail and nuance will erupt and 20 comments later people will be throwing historiography at each other.
But back to the point… even as the metaphor is “factually” inaccurate when examined in detail, it can be useful, within its limits for understanding a human dynamic. Like a model, it helps evaluate situations and forecast future outcomes.
The metaphor is not dissimilar to the “models” that lie in our cortical columns – assembled from sensory inputs and observation, that allow us to make sense of the world, but which are constantly tested by feedback from the world. Which validates, changes, or invalidates the use of the “model.” (In this case my observation is most people who invoke the phrase are the very people who are likely not to do lion stuff when called upon, like many self-directed metaphors it becomes the fantasy of the selves we want to be, rather than the people we are.)
This is why, if I’m on my game, you’ll see me either introducing or concluding a metaphor or analogy in my writing with a nod “given that all analogies have limits…”
The limits are that they are *not* the actual real world. Only abstractions. So too with ChatGPT, so too with any LLM, and so too with any human-constructed “model” regardless of medium: computational, qualitatively descriptive in literature, arts, music, or other expressive forms.
SOURCES:
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
DISCUSSION
What I’m seeing in what you wrote it ‘basically, it works like a brain, but far better than average.’ ?
To the extent we know of how a brain works, it’s not dissimilar. Which was the big breakthrough in AI in the last 10-15 years shifting away from rules-systems in the AI winter to generating more emergent behaviors from lower level permutations.
From what I can tell, if this is correct of how our brains work, the AI LLM and other big-scale AI efforts are not far off from a structural mechanism even if they still lack crucial components we take for granted. https://www.numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/
This makes sense – but why write a lesson plan, white paper, report for the for the sake of turning it in to someone who won’t read it every again. Because chatGPT will do it for me.
I think in that context, the takeaway is “use it as a first draft” and subject it to review to ensure specifics are accurate to the extent you think those specifics will matter. Which really, is no different than any other drafting/brainstorming exercise. The act of which we get ideas initially on paper (digitally or physically) and then later go back and refine, consider, evaluate and accept/reject those ideas isn’t going to change. It’s just that ChatGPT can provide that first step for those who want it and that first step may take them a lot further towards the goal.
I’m interested in playing with its ability to write IEP goals.
That’s an interesting application I hadn’t thought of. Challenge is going to be scale. I know a general rule of thumb starting from scratch is that a training data set needs 50k + incidents to be useful for a learning model…but I’m not sure how effective an existing LLM is when pointed to a smaller corpus to learn from that. I suspect given the formulaic and idiomatic approach to such things (having seen a few on the parent side) it might be useful.
Which raises the question, what are corresponding compression artifacts in human neural networks? They are probably all around us, but we are largely blind to them due to our nature and conditioning. An almost certain candidate would be superstition, for example.
In addition to the three already mentioned of metaphor, analogy, and superstition we could add myths, stereotypes, folktales, euphamisms, caricatures, “short-hand” I guess in a way “langauge” itself is an abstracted representation of the precise detail of communication. Though that’s getting a bit into waters I’m less familiar with but maybe a linguist or sociologist would like to chime in.
What about the uncanny valley effect when AI tries to create an image of a human with malformed hands?
It’s entirely possible that what constitutes the “uncanny valley” effect is an evolutionary holdover from our own sensory development of kin-recognition adaptations. The specifics are beyond me, but that’s what I thought about when I read your post the first time. It’s an interesting idea.
I mean, it’s possible given the number of homo species variants in the way back when. Need not have even arisen within homo sapiens but could be inherited from some further back adaptation related to kin recognition.