“Hallucination” is an anthropomorphized term for what’s happening. The actual cause is much simpler, there’s no semantic distinction between true and false statements. Both are equally plausible as far as a language model is concerned, as long as it’s semantically structured like an answer to the question being asked.
Exactly, the LLM isn’t “thinking,” it’s just matching inputs to outputs with some randomness thrown in. If your data is high quality, a lot of the time the answers will be appropriate given the inputs. If your data is poor, it’ll output surprising things more often.
It’s a really cool technology in how much we get for how little effort we put in, but it’s not “thinking” in any sense of the word. If you want it to “think,” you’ll need to put in a lot more effort.
Your brain is also “just” matching inputs to outputs using complex statistics, a huge number of interconnects and clever digital-analog mixed ionic circuitry.
People can mean different things. Intelligence can mean a calculator doing a sum, and it can mean the way humans talk to each other. AI can do some intelligent things without people agreeing that it’s intelligent in the latter sense.
It’s dead simple to see if you’re talking to an LLM. The latest models don’t pass the Turing test, not even close. Asking them simple shit causes them to crap themselves really quickly.
Ask ChatGPT how many r’s there are in “veryberry”. When it gets it wrong, tell it you’re disappointed and expect a correct answer. If you do that repeatedly, you can get it to claim there’s more r’s in the word than it has letters.
Can you show the question you asked that led to this and which model was used? I just tested in several models, even slightly older ones and they all answered precisely. Of course if you follow up and tell it the right answer is wrong you can make it say stuff like this, but not one got it wrong out of the gate.
The same thing actually passing a turing test would require. You’ve obviously read the words “Turing test” somewhere and thought you understood what it meant, but no robot we’ve ever produced as a species has passed the turing test. It EXPLICITLY requires that intelligence equal to (or indistinguishable from) HUMAN intelligence is shown. Without a liar reading responses, no AI we’ll produce for decades will pass the turing test.
No large language model has intelligence. They’re just complicated call and response mechanisms that guess what answer we want based on a weighted response system (we tell it directly or tell another machine how to help it “weigh” words in a response). Obviously with anything that requires massive amounts of input or nuance, like language, it’ll only be right about what it was guided on, which is limited to areas it is trained in.
We don’t have any novel interactions with AI. They are regurgitation engines, bringing forward sentences that aren’t theirs piecemeal. Given ten messages, I’m confident no major LLM would pass a Turing test.
The Turing test is flawed, because while it is supposed to test for intelligence it really just tests for a convincing fake. Depending on how you set it up I wouldn’t be surprised if a modern LLM could pass it, at least some of the time. That doesn’t mean they are intelligent, they aren’t, but I don’t think the Turing test is good justification.
For me the only justification you need is that they predict one word (or even letter!) at a time. ChatGPT doesn’t plan a whole sentence out in advance, it works token by token… The input to each prediction is just everything so far, up to the last word. When it starts writing “As…” it has no concept of the fact that it’s going to write “…an AI A language model” until it gets through those words.
Frankly, given that fact it’s amazing that LLMs can be as powerful as they are. They don’t check anything, think about their answer, or even consider how to phrase a sentence. Everything they do comes from predicting the next token… An incredible piece of technology, despite it’s obvious flaws.
The Turing test is flawed, because while it is supposed to test for intelligence it really just tests for a convincing fake.
This is just conjecture, but I assume this is because the question of consciousness is not really falsifiable, so you just kind of have to draw an arbitrary line somewhere.
Like, maybe tech gets so good that we really can’t tell the difference, and only god knows it isn’t really alive. But then, how would we know not to give the machine legal rights?
For the record, ChatGPT does not pass the turing test.
ChatGPT is not designed to fool us into thinking it’s a human. It produces language with a specific tone & direct references to the fact it is a language model. I am confident that an LLM trained specifically to speak naturally could do it. It still wouldn’t be intelligent, in my view.
The Turing Test says that any person could have any conversation with a machine and there’s no chance you could tell it’s a machine. It does not say that one person could have one conversation with a machine and not be able to tell.
Current text generation models out themselves all the damn time. It can’t actually understand the underlying concepts of words. It just predicts what bit of text would be most convincing to a human based on previous text.
Playing Go was never the mark of AI, it was the mark of improving game-playing machines. It doesn’t represent “intelligence”, only an ability to predict what should happen next based on a set of training data.
It’s worth noting that after Lee Se Dol lost to Alphago, researchers found a fairly trivial Go strategy that could reliably beat the machine. It was simply such an easy strategy to counter that none of the games in the training data had included anyone attempting that strategy, so the algorithm didn’t account for how to counter it. Because the computer doesn’t know Go theory, it only knows how to predict what to do next based on the training data.
Detecting the machine correctly once is not enough. You need to guess correctly most of the time to statistically prove it’s not by chance. It’s possible for some people to do this, but I’ve seen a lot of comments on websites accusing HUMAN answers of being written by AIs.
If the current chat bots improve to reliably not be detected, would that be intelligence then?
KataGo just fixed that bug by putting those positions into the training data. The reason it wasn’t in the training data is because the training data at first was just self-play games. When games that are losses for the AI from humans are included, the bug is fixed.
The AI grasps the strategic aspects of the game really well. To the point that if you don’t let it “read” deeply into the game tree, but only “guess” moves (that is, only use the policy network) it still plays at a high level (below professional, but strong amateur)
No, really, if you understood how the language models work, you would understand it’s not really intelligence. We just tend to humanize it because that’s what our brains do.
There’s a lot of great articles that summarize how we got to this stage and it’s pretty interesting. I’ll try to update this post with a link later.
I think LLMs are useful (and fun) and have a place, but intelligence they are not.
I’m happy with the Oxford definition: “the ability to acquire and apply knowledge and skills”.
LLMs don’t have knowledge as they don’t actually understand anything. They are algorithmic response generators that apply scores to tokens, and spit out the highest scoring token considering all previous tokens.
If asked to answer 10*5, they can’t reason through the math. They can only recognize 10, * and 5 as tokens in the training data that is usually followed by the 50 token. Thus, 50 is the highest scoring token, and is the answer it will choose. Things get more interesting when you ask questions that aren’t in the training data. If it has nothing more direct to copy from, it will regurgitate a sequence of tokens that sounds as close as possible to something in the training data: thus a hallucination.
The human could be described in very similar terms. People think we’re magic or something, but we to are just a weighted neural network assembling outputs based strictly on training data built from reinforcement. We are just for the moment much much better with massive models. Of course that is reductive but many seem to forget that brains suffer similarly when outside of training data.
I’m slightly confused. Which part needs an academic paper? I’ve made three admittedly reductive claims.
Human brains are neural networks.
Its outputs are based on training data built from reinforcement.
We have a much more massive model than current artificial networks.
First, I’m not trying to make some really clever statement. I’m just saying there is a perspective where describing the human brain can generally follow a similar description. Nevertheless, let’s look at the only three assertions I make here. Given that the term neural network is given its namesake from the neurons that make up brains, I assume you don’t take issue with this. The second point, I don’t know if linking to scholarly research is helpful. Is it not well established that animals learn and use reward circuitry like the role of dopamine in neuromodulation? We also have… education, where we are fed information so that we retain it and can recount it down the road.
I guess maybe it is worth exploring the third, even though, I really wasn’t intending to make a scholarly statement. Here is an article in Scientific American that gives the number of neural connections around 100 trillion. Now, how that equates directly to model parameters is absolutely unclear, but even if you take glial cells where the number can be as low as 40-130 billion according to The search for true numbers of neurons and glial cells in the human brain: A review of 150 years of cell counting, that number is in the same order of magnitude of current models’ parameters. So I guess, if your issue is that AI models are actually larger than the human brain’s, I guess maybe there is something cogent. But given that there is likely at least a 1000:1 ratio of neural connections to neurons, I just don’t think that is really fair at all.
So, first of all, thank you for the cogent attempt at responding. We may disagree, but I sincerely respect the effort you put into the comment.
The specific part that I thought seemed like a pretty big claim was that human brains are “simply” more complex neural networks and that the outputs are based strictly on training data.
Is it not well established that animals learn and use reward circuitry like the role of dopamine in neuromodulation?
While true, this is way too reductive to be a one to one comparison with LLMs. Humans have genetic instinct and body-mind connection that isn’t cleanly mappable onto a neural network. For example, biologists are only just now scraping the surface of the link between the brain and the gut microbiome, which plays a much larger role on cognition than previously thought.
Another example where the brain = neural network model breaks down is the fact that the two hemispheres are much more separated than previously thought. So much so that some neuroscientists are saying that each person has, in effect, 2 different brains with 2 different personalities that communicate via the corpus callosum.
There’s many more examples I could bring up, but my core point is that the analogy of neural network = brain is just that, a simplistic analogy, on the same level as thinking about gravity only as “the force that pushes you downwards”.
To say that we fully understand the brain, to the point where we can even make a model of a mosquito’s brain (220,000 neurons), I think is mistaken. I’m not saying we’ll never understand the brain enough to attempt such a thing, I’m just saying that drawing a casual equivalence between mammalian brains and neural networks is woefully inadequate.
For what it’s worth, in spite of my poor choice of words and general ignorance on many topics, I agree with everything you said here, and find these fascinating topics. Especially that of our microbiome which I think by mass is larger than our brains; so who’s really doing the thinking around here?
Artificial neural nets no, but neural networks in general yes. Just because the computer version isn’t like the real thing doesn’t mean that humans do not use a type of neural network.
This can be intuitively understood if you’ve gone through difficult college classes. There’s two ways to prepare for exams. You either try to understand the material, or you try to memorize it.
The latter isn’t good for actually applying the information in the future, and it’s most akin to what an LLM does. It regurgitates, but it doesn’t learn. You show it a bunch of difficult engineering problems, and it won’t be able to solve different ones that use the same principle.
Even people hallucinate. Under your definition intelligence doesn’t exist
“Hallucination” is an anthropomorphized term for what’s happening. The actual cause is much simpler, there’s no semantic distinction between true and false statements. Both are equally plausible as far as a language model is concerned, as long as it’s semantically structured like an answer to the question being asked.
That’s also pretty true for people, unfortunately. People are deeply incapable of differentiating fact from fiction.
Like how many, five?
No that’s not it at all. People know that they don’t know some things. LLMs do not.
Exactly, the LLM isn’t “thinking,” it’s just matching inputs to outputs with some randomness thrown in. If your data is high quality, a lot of the time the answers will be appropriate given the inputs. If your data is poor, it’ll output surprising things more often.
It’s a really cool technology in how much we get for how little effort we put in, but it’s not “thinking” in any sense of the word. If you want it to “think,” you’ll need to put in a lot more effort.
Your brain is also “just” matching inputs to outputs using complex statistics, a huge number of interconnects and clever digital-analog mixed ionic circuitry.
Wow whoosh. The point is that “AI” isn’t actually “intelligent” like a human and thus can’t “hallucinate” like an intelligent human.
All of this anthropomorphic terminology is just misleading marketing bullshit.
Who said anything about human intelligence? AIs have a different kind of intelligence, an artificial kind. I’m tired of pretending they don’t
Ever heard of the Turing test? Ever since AIs could pass it it became not a thing. Before that, playing Go was the mark of AI.
Any time an AI achieves a new thing people move goalposts. So I ask you: what does AI need to achieve to have intelligence?
In place of the Turing test we have a new test that informs us whether an individual can properly identify a stochastic parrot
People can mean different things. Intelligence can mean a calculator doing a sum, and it can mean the way humans talk to each other. AI can do some intelligent things without people agreeing that it’s intelligent in the latter sense.
https://en.m.wikipedia.org/wiki/Turing_test
Here you go since you’ve heard of it but don’t understand it.
Current AIs pass it, since most people can’t reasonably tell between AI and human-written stuff every time
It’s dead simple to see if you’re talking to an LLM. The latest models don’t pass the Turing test, not even close. Asking them simple shit causes them to crap themselves really quickly.
Ask ChatGPT how many r’s there are in “veryberry”. When it gets it wrong, tell it you’re disappointed and expect a correct answer. If you do that repeatedly, you can get it to claim there’s more r’s in the word than it has letters.
that’s it? you asked one question and that was enough for you?
It’s quite easy to identify an AI when you’re talking to one. To be fair, you need to actually run the Turing test since it removes confirmation bias
Here’s what I got:**
Can you show the question you asked that led to this and which model was used? I just tested in several models, even slightly older ones and they all answered precisely. Of course if you follow up and tell it the right answer is wrong you can make it say stuff like this, but not one got it wrong out of the gate.
The same thing actually passing a turing test would require. You’ve obviously read the words “Turing test” somewhere and thought you understood what it meant, but no robot we’ve ever produced as a species has passed the turing test. It EXPLICITLY requires that intelligence equal to (or indistinguishable from) HUMAN intelligence is shown. Without a liar reading responses, no AI we’ll produce for decades will pass the turing test.
No large language model has intelligence. They’re just complicated call and response mechanisms that guess what answer we want based on a weighted response system (we tell it directly or tell another machine how to help it “weigh” words in a response). Obviously with anything that requires massive amounts of input or nuance, like language, it’ll only be right about what it was guided on, which is limited to areas it is trained in.
We don’t have any novel interactions with AI. They are regurgitation engines, bringing forward sentences that aren’t theirs piecemeal. Given ten messages, I’m confident no major LLM would pass a Turing test.
The chat bots will pass the Turing test in a few years, maybe 5. Would that be intelligence then?
The Turing test is flawed, because while it is supposed to test for intelligence it really just tests for a convincing fake. Depending on how you set it up I wouldn’t be surprised if a modern LLM could pass it, at least some of the time. That doesn’t mean they are intelligent, they aren’t, but I don’t think the Turing test is good justification.
For me the only justification you need is that they predict one word (or even letter!) at a time. ChatGPT doesn’t plan a whole sentence out in advance, it works token by token… The input to each prediction is just everything so far, up to the last word. When it starts writing “As…” it has no concept of the fact that it’s going to write “…an AI A language model” until it gets through those words.
Frankly, given that fact it’s amazing that LLMs can be as powerful as they are. They don’t check anything, think about their answer, or even consider how to phrase a sentence. Everything they do comes from predicting the next token… An incredible piece of technology, despite it’s obvious flaws.
This is just conjecture, but I assume this is because the question of consciousness is not really falsifiable, so you just kind of have to draw an arbitrary line somewhere.
Like, maybe tech gets so good that we really can’t tell the difference, and only god knows it isn’t really alive. But then, how would we know not to give the machine legal rights?
For the record, ChatGPT does not pass the turing test.
ChatGPT is not designed to fool us into thinking it’s a human. It produces language with a specific tone & direct references to the fact it is a language model. I am confident that an LLM trained specifically to speak naturally could do it. It still wouldn’t be intelligent, in my view.
The Turing Test says that any person could have any conversation with a machine and there’s no chance you could tell it’s a machine. It does not say that one person could have one conversation with a machine and not be able to tell.
Current text generation models out themselves all the damn time. It can’t actually understand the underlying concepts of words. It just predicts what bit of text would be most convincing to a human based on previous text.
Playing Go was never the mark of AI, it was the mark of improving game-playing machines. It doesn’t represent “intelligence”, only an ability to predict what should happen next based on a set of training data.
It’s worth noting that after Lee Se Dol lost to Alphago, researchers found a fairly trivial Go strategy that could reliably beat the machine. It was simply such an easy strategy to counter that none of the games in the training data had included anyone attempting that strategy, so the algorithm didn’t account for how to counter it. Because the computer doesn’t know Go theory, it only knows how to predict what to do next based on the training data.
Detecting the machine correctly once is not enough. You need to guess correctly most of the time to statistically prove it’s not by chance. It’s possible for some people to do this, but I’ve seen a lot of comments on websites accusing HUMAN answers of being written by AIs.
If the current chat bots improve to reliably not be detected, would that be intelligence then?
KataGo just fixed that bug by putting those positions into the training data. The reason it wasn’t in the training data is because the training data at first was just self-play games. When games that are losses for the AI from humans are included, the bug is fixed.
You’re not grasping the fundamental problem here.
This is like saying a calculator understands math because when you plug in the right functions, you get the right answers.
The AI grasps the strategic aspects of the game really well. To the point that if you don’t let it “read” deeply into the game tree, but only “guess” moves (that is, only use the policy network) it still plays at a high level (below professional, but strong amateur)
How does it “understand the strategic aspects of the game really well” if it can’t solve problems it hasn’t seen the answers to?
It doesn’t get fed answers in the training data, only positions. If it sees a position, it will eventually learn to solve it by itself
This is some real “what else besides witches floats in water” ass-logic
No, really, if you understood how the language models work, you would understand it’s not really intelligence. We just tend to humanize it because that’s what our brains do.
There’s a lot of great articles that summarize how we got to this stage and it’s pretty interesting. I’ll try to update this post with a link later.
I think LLMs are useful (and fun) and have a place, but intelligence they are not.
I’m still waiting for the definition of intelligence that won’t have the same moving of goalposts the Turing Test did
I’m happy with the Oxford definition: “the ability to acquire and apply knowledge and skills”.
LLMs don’t have knowledge as they don’t actually understand anything. They are algorithmic response generators that apply scores to tokens, and spit out the highest scoring token considering all previous tokens.
If asked to answer 10*5, they can’t reason through the math. They can only recognize 10, * and 5 as tokens in the training data that is usually followed by the 50 token. Thus, 50 is the highest scoring token, and is the answer it will choose. Things get more interesting when you ask questions that aren’t in the training data. If it has nothing more direct to copy from, it will regurgitate a sequence of tokens that sounds as close as possible to something in the training data: thus a hallucination.
The human could be described in very similar terms. People think we’re magic or something, but we to are just a weighted neural network assembling outputs based strictly on training data built from reinforcement. We are just for the moment much much better with massive models. Of course that is reductive but many seem to forget that brains suffer similarly when outside of training data.
That’s a strong claim. Got an academic paper to back that up?
I’m slightly confused. Which part needs an academic paper? I’ve made three admittedly reductive claims.
First, I’m not trying to make some really clever statement. I’m just saying there is a perspective where describing the human brain can generally follow a similar description. Nevertheless, let’s look at the only three assertions I make here. Given that the term neural network is given its namesake from the neurons that make up brains, I assume you don’t take issue with this. The second point, I don’t know if linking to scholarly research is helpful. Is it not well established that animals learn and use reward circuitry like the role of dopamine in neuromodulation? We also have… education, where we are fed information so that we retain it and can recount it down the road.
I guess maybe it is worth exploring the third, even though, I really wasn’t intending to make a scholarly statement. Here is an article in Scientific American that gives the number of neural connections around 100 trillion. Now, how that equates directly to model parameters is absolutely unclear, but even if you take glial cells where the number can be as low as 40-130 billion according to The search for true numbers of neurons and glial cells in the human brain: A review of 150 years of cell counting, that number is in the same order of magnitude of current models’ parameters. So I guess, if your issue is that AI models are actually larger than the human brain’s, I guess maybe there is something cogent. But given that there is likely at least a 1000:1 ratio of neural connections to neurons, I just don’t think that is really fair at all.
So, first of all, thank you for the cogent attempt at responding. We may disagree, but I sincerely respect the effort you put into the comment.
The specific part that I thought seemed like a pretty big claim was that human brains are “simply” more complex neural networks and that the outputs are based strictly on training data.
While true, this is way too reductive to be a one to one comparison with LLMs. Humans have genetic instinct and body-mind connection that isn’t cleanly mappable onto a neural network. For example, biologists are only just now scraping the surface of the link between the brain and the gut microbiome, which plays a much larger role on cognition than previously thought.
Another example where the brain = neural network model breaks down is the fact that the two hemispheres are much more separated than previously thought. So much so that some neuroscientists are saying that each person has, in effect, 2 different brains with 2 different personalities that communicate via the corpus callosum.
There’s many more examples I could bring up, but my core point is that the analogy of neural network = brain is just that, a simplistic analogy, on the same level as thinking about gravity only as “the force that pushes you downwards”.
To say that we fully understand the brain, to the point where we can even make a model of a mosquito’s brain (220,000 neurons), I think is mistaken. I’m not saying we’ll never understand the brain enough to attempt such a thing, I’m just saying that drawing a casual equivalence between mammalian brains and neural networks is woefully inadequate.
For what it’s worth, in spite of my poor choice of words and general ignorance on many topics, I agree with everything you said here, and find these fascinating topics. Especially that of our microbiome which I think by mass is larger than our brains; so who’s really doing the thinking around here?
That’s an obsolete description of what a mammal’s brain is.
Do you have a better one?
I could find a dozen better ones in google, but I’m not a neurophysiologist.
The important thing here is that neural nets do not describe human brain.
Artificial neural nets no, but neural networks in general yes. Just because the computer version isn’t like the real thing doesn’t mean that humans do not use a type of neural network.
This can be intuitively understood if you’ve gone through difficult college classes. There’s two ways to prepare for exams. You either try to understand the material, or you try to memorize it.
The latter isn’t good for actually applying the information in the future, and it’s most akin to what an LLM does. It regurgitates, but it doesn’t learn. You show it a bunch of difficult engineering problems, and it won’t be able to solve different ones that use the same principle.
I think the definition is “whichever is more emotionally important to you.” So, in your case, they would be very, very intelligent.
LLMs aren’t even hallucinating thou. It’s a euphamistic term to make it’s limitations sound human like