Apple's study proves that LLM-based AI models are flawed because they cannot reason

tundraboy · October 13, 2024 9:53PM

22july2013 said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

And of course those Boston Dynamics robot dogs can't run. It's not a living body. It's the illusion of running. Illusion, shmillusion. If it works, that's all I care about. Maybe you agree with me, you're just quibbling over a word.

AI would not have spotted your straw man argument there when you tried to bolster your position by implying that running, as a computational problem is just as complex as real-world logical thinking (i.e general intelligence). Unfortunately, humans would.

cheesefreeze · October 13, 2024 10:08PM

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

Something doesn’t have to live to have true intelligence. Our brains are an amazing instrument that holds many mysteries, but if we treat it as magical, we are holding ourselves back as scientists.

byronl said:
Did they test OpenAI's o1 models, specifically meant to be good at reasoning?

No they didn’t :-(

Using GPT O1-Preview, a new reasoning based model, the query in this article lead to the following answer:

To find out how many kiwis Oliver has, let’s add up the number of kiwis he picked each day:
1. Friday: Oliver picks 44 kiwis.
2. Saturday: He picks 58 kiwis.
3. Sunday: He picks double the number he did on Friday, which is kiwis.
Now, let’s calculate the total:
The mention of “Five of them were a bit smaller than average” doesn’t affect the total count unless specified otherwise. Therefore, Oliver has 190 kiwis.

edited October 2024

iadlib · October 14, 2024 12:15AM

Of course LLM AI can't reason, neither can most humans, but that doesn't stop us as a species.

raulcristian · October 14, 2024 1:16AM

TheX_fr said:
No need to go too far in scientific research and analysis to observe this fact. We asked chat gpt 4o, the full omniscient and omnipotent version according to OpenAI and Saint Altman, a very simple question that requires both logical reasoning and real language knowledge… Here's the answer:

"Sure! Here are seven popular expressions with the first word having 7 letters and the second word having 5 letters:
Lucky break
Future plans
Secret code
Change form
Digital age
Genuine love
Creative mind
If you need more or something different, just let me know!"
For sure we will let it know…

Claude (Anthropic) which in my view is far better than ChatGPT also failed to complete this task properly the first four times. But after insisting it got a right answer. Funny.

I agree, even with faults, these LLMs are very, very useful. You just need to double check whatever they spit.

beowulfschmidt · October 14, 2024 11:38AM

Inability to reason is a feature, not a flaw.

trancemist · October 14, 2024 12:05PM

It is just a high maintenance bot that won’t eat imperfect kiwi.

applezulu · October 14, 2024 12:23PM

Remember the New York Times article where the reporter played with a LLM AI program and it declared its love for him and told him to leave his wife? This is the article that seemingly kicked off the popular culture freak-out over AI.

If you go back and read that, particularly the ‘conversation’ transcript, it’s sophisticated in its use of language, but you can easily see the AI bouncing off its programming parameters and limits as the reporter tries to push it to do things it was programmed not to do, or simply didn’t have the data to devise a logical reaction. In those ways, it wasn’t that much different than the AI programs of the 1960s and 70s that seemed “human” just by evasively reflecting questions back in response to queries.

It’s the human instinct to see faces in inanimate objects that keeps making people hype LLM AI as more than it is. It’s a remarkable thing, the ability to dump massive amounts of data into a computer and have it slice, dice, recompile and regurgitate things based on plain language queries. It really is. But that’s not reasoning or originality, much less cognitive sentience.

y2an · October 14, 2024 3:09PM

Duh! It’s simulated intelligence, not artificial…

mike1 · October 14, 2024 3:58PM

iOSDevSWE said:

The article lacks fact checking and details like when were the tests conducted either OpenAIs models and which model was used. When I perform the request I get the following answer from chatGPT 4o:

Question: ” Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday. Of the kiwis picked on Sunday, five of them were a bit smaller than average. How many kiwis does Oliver have?”

Answer: “ Let’s break this down:

• On Friday, Oliver picks 44 kiwis.
• On Saturday, he picks 58 kiwis.
• On Sunday, he picks double the number of kiwis he did on Friday, so he picks 44 \times 2 = 88 kiwis.

The total number of kiwis he picks is:

44 \text{ (Friday)} + 58 \text{ (Saturday)} + 88 \text{ (Sunday)} = 190 \text{ kiwis.}

So, Oliver has 190 kiwis in total. The fact that five of the kiwis picked on Sunday are smaller doesn’t affect the total number.”

Perfect answer!

And I just received this response to the query.

On Friday, Oliver picks 44 kiwis.

On Saturday, Oliver picks 58 kiwis.

On Sunday, Oliver picks double the number of kiwis he did on Friday, so 2 * 44 = 88 kiwis.

Five of the kiwis picked on Sunday were smaller than average, so Oliver picked 88 - 5 = 83 average-sized kiwis.

In total, Oliver picked 44 + 58 + 83 = 185 kiwis. Therefore, Oliver has 185 kiwis.

blastdoor · October 14, 2024 4:53PM

foregoneconclusion said:

The primary issue with LLM computing is the ridiculously high power requirements. It goes against all of the low power hardware development of the last couple of decades.

This issue is overblown and misinterpreted.

I read that the energy requirement to train a model is ballpark similar to the total energy needed to raise two American humans to the age of 25. That's a lot of energy, but the cost of inference on the model is vastly smaller than the cost of inference for those two humans, and the model can be copied and reused without limit.

So I think it's almost certainly going to be the case that the total energy cost of accomplishing tasks with LLMs will be much lower than using humans.

geechee · October 15, 2024 12:49AM

“...still lack basic reasoning skills.”

Hahaha, kinda reminds me of Siri since day one til today.

johnc1959 · October 15, 2024 2:56AM

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

How about a very basic definition: give correct answers to questions given.

lukei · October 15, 2024 5:59AM

A significant number of people I encounter can’t do basic maths without a calculator. All humans are not equal.

exceptionhandler · October 16, 2024 5:18PM

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.

tht · October 16, 2024 5:44PM

blastdoor said:

foregoneconclusion said:

The primary issue with LLM computing is the ridiculously high power requirements. It goes against all of the low power hardware development of the last couple of decades.

This issue is overblown and misinterpreted.

I read that the energy requirement to train a model is ballpark similar to the total energy needed to raise two American humans to the age of 25. That's a lot of energy, but the cost of inference on the model is vastly smaller than the cost of inference for those two humans, and the model can be copied and reused without limit.

So I think it's almost certainly going to be the case that the total energy cost of accomplishing tasks with LLMs will be much lower than using humans.

I’d like to read what you read, there. With what you describe, it doesn’t make sense.

Are you saying that one training run to develop an inference model is equivalent to the energy needs of one human for 50 years (ie, 2 humans for 25 years)? That sounds ridiculous expensive.

You can ballpark one human needing about 2 MWH per year. 100 MWH across 50 years for a human. That’s one training session that takes weeks to months for a billion parameter model.

Where it would add up is multiple training runs are being done simultaneously and perpetually in the future, and they want to get it to trillion parameter models, right?

So, what are the assumptions here?

tht · October 16, 2024 6:26PM

exceptionhandler said:

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.

In the context of LLMs, I think “intelligence” comes down to whether the LLM is capable of coming up with a good answer that is outside of its training data set.

Eventually, LLM developers will be successful at brute forcing it, where they will code it to look at all possible solutions outside of its training data and find a best fit. The trick is some kind of analyzer that weeds out the crap.

One example I’m thinking of is Einstein’s special theory of relativity. You train an LLM with all physics data up to 1900, will an LLM with its “reasoning” or “intelligence” model come up with the special relativity (postulate 2)?

That has to be the ultimate answer from the Michelson-Morley experiment years before that, but how would a code and dataset know? The consequences of postulate 2 is absolute mind-bending, well, space-time bending, and how would code know it is the correct answer?

You can think of lots of examples like this. Intelligence is when correct conclusions arise that are outside of the training data.

A musing I have about AI is when we get to an AI “singularity”, something indistinguishable from a human, all it will teach us is that there is no such thing as “intelligence”. We are all just biological interpolating machines. Nothing special about it. Just a machine with responses based on our inputs cranked through our biological computers.

And hence, it destroys our idea of AI’s being alive because it is just a CMOS based computer, built like our biological ones. Consequences for whether soul exists, whether free will exists abound.

marvin · October 16, 2024 7:12PM

tht said:

exceptionhandler said:

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.

In the context of LLMs, I think “intelligence” comes down to whether the LLM is capable of coming up with a good answer that is outside of its training data set.

Eventually, LLM developers will be successful at brute forcing it, where they will code it to look at all possible solutions outside of its training data and find a best fit. The trick is some kind of analyzer that weeds out the crap.

One example I’m thinking of is Einstein’s special theory of relativity. You train an LLM with all physics data up to 1900, will an LLM with its “reasoning” or “intelligence” model come up with the special relativity (postulate 2)?

That has to be the ultimate answer from the Michelson-Morley experiment years before that, but how would a code and dataset know? The consequences of postulate 2 is absolute mind-bending, well, space-time bending, and how would code know it is the correct answer?

You can think of lots of examples like this. Intelligence is when correct conclusions arise that are outside of the training data.

This is good description of the qualities of intelligence.

Perhaps strategic decisions too. When a machine plays a human at chess or other game, the computing power can play large amounts of outcomes and choose the best but the human plays the moves without this data.

https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/

tht said:

exceptionhandler said:

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.
A musing I have about AI is when we get to an AI “singularity”, something indistinguishable from a human, all it will teach us is that there is no such thing as “intelligence”. We are all just biological interpolating machines. Nothing special about it. Just a machine with responses based on our inputs cranked through our biological computers.

And hence, it destroys our idea of AI’s being alive because it is just a CMOS based computer, built like our biological ones. Consequences for whether soul exists, whether free will exists abound.

The idea that humans aren't particularly special makes people uncomfortable and drives dismissiveness towards technological advances. Most people would be willing to accept that humans evolved from primitive mammals but don't consider those primitive mammals to have any special qualities, certainly no concept of a soul. These delusions of grandeur will be tested in the coming years.

Humans designed machines around how our minds work - collections of neurons connected together like transistors accessing patterns etched into memory to form new data. There's something more to the structure than raw computing but it's an evolved structure of neurons that gives the impression that it's more than the sum of its parts. Sufficiently advanced technology that seems ethereal.

tht · October 16, 2024 8:39PM

Marvin said:

tht said:

exceptionhandler said:

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.

In the context of LLMs, I think “intelligence” comes down to whether the LLM is capable of coming up with a good answer that is outside of its training data set.

Eventually, LLM developers will be successful at brute forcing it, where they will code it to look at all possible solutions outside of its training data and find a best fit. The trick is some kind of analyzer that weeds out the crap.

One example I’m thinking of is Einstein’s special theory of relativity. You train an LLM with all physics data up to 1900, will an LLM with its “reasoning” or “intelligence” model come up with the special relativity (postulate 2)?

That has to be the ultimate answer from the Michelson-Morley experiment years before that, but how would a code and dataset know? The consequences of postulate 2 is absolute mind-bending, well, space-time bending, and how would code know it is the correct answer?

You can think of lots of examples like this. Intelligence is when correct conclusions arise that are outside of the training data.

This is good description of the qualities of intelligence.

Perhaps strategic decisions too. When a machine plays a human at chess or other game, the computing power can play large amounts of outcomes and choose the best but the human plays the moves without this data.

https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/

Hehe, yes, if you know how the software works, you could defeat the software. Like with Face ID, it does some melange of identifying a person's face by looking at the relative positions of eyes, length of nose, relative position of nose to eyes. You can defeat Face ID's recognition of you be just pulling the tip of your nose to the side with a piece of tape. Humans are so much better at facial recognition and can tell it's just you with a piece of tape on your face. Similarly, you can't defeat IR based human recognition by wearing patches of heat reflective makeup or clothing.

But solutions to this is just adding more data. More and more "correct" responses will be put into the training set, and makes this illusion of intelligence expand more and more. To the point that it will take hard work to find edge cases. Even di novo insight is something that can be done. At minimum, it's going to be brute forced. Arms race.

applezulu · October 17, 2024 3:15PM

tht said:

Marvin said:

tht said:

exceptionhandler said:

12Strangers said:

hexclock said:

Of course they can’t reason. It’s not a living mind. It’s the illusion of intelligence.

I’m legitimately interested in hearing the definition of intelligence from anyone who will offer it.

This is a very philosophical topic that is based on worldview instead of science. Trying to define what is life vs non-life is a very difficult thing to do , let alone sentience, and cannot be proved with science alone. Whatever answer you get will be biased towards one’s worldview.

In the context of LLMs, I think “intelligence” comes down to whether the LLM is capable of coming up with a good answer that is outside of its training data set.

Eventually, LLM developers will be successful at brute forcing it, where they will code it to look at all possible solutions outside of its training data and find a best fit. The trick is some kind of analyzer that weeds out the crap.

One example I’m thinking of is Einstein’s special theory of relativity. You train an LLM with all physics data up to 1900, will an LLM with its “reasoning” or “intelligence” model come up with the special relativity (postulate 2)?

That has to be the ultimate answer from the Michelson-Morley experiment years before that, but how would a code and dataset know? The consequences of postulate 2 is absolute mind-bending, well, space-time bending, and how would code know it is the correct answer?

You can think of lots of examples like this. Intelligence is when correct conclusions arise that are outside of the training data.

This is good description of the qualities of intelligence.

Perhaps strategic decisions too. When a machine plays a human at chess or other game, the computing power can play large amounts of outcomes and choose the best but the human plays the moves without this data.

https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/

Hehe, yes, if you know how the software works, you could defeat the software. Like with Face ID, it does some melange of identifying a person's face by looking at the relative positions of eyes, length of nose, relative position of nose to eyes. You can defeat Face ID's recognition of you be just pulling the tip of your nose to the side with a piece of tape. Humans are so much better at facial recognition and can tell it's just you with a piece of tape on your face. Similarly, you can't defeat IR based human recognition by wearing patches of heat reflective makeup or clothing.

But solutions to this is just adding more data. More and more "correct" responses will be put into the training set, and makes this illusion of intelligence expand more and more. To the point that it will take hard work to find edge cases. Even di novo insight is something that can be done. At minimum, it's going to be brute forced. Arms race.

I think this is a central concept to understand. Current AI design is based on the thing computers can actually do better than humans: collect, store and then instantly and accurately recall information. When you ask a chatbot to write "Green Eggs and Ham" in the style of Charles Dickens, it instantly accesses the entire (copyrighted) Dr. Seuss work, then accesses the complete works of Dickens, analyzes stylistic language patterns from the latter and then tells the Seuss story following those patterns. It does all that very quickly, and humans who recall both authors much more vaguely will still recognize the patterns and think it clever.

What's interesting is that, with much less information recalled with much lower accuracy, the human reader can recognize what the AI program could do only because it had access to a much larger data set.

A human can also do a better job of the thing itself, with a much smaller data set as a resource. While it would take longer to accomplish, a human ninth grader could read parts of one or two Dickens stories, plus Green Eggs and Ham, and carry out the same task. Plus, if motivated, the human student would probably write something far more clever, with multiple layers of interest, along with added jokes and contextual cues referring to the class, teachers, student peers and the assignment itself. This is because the student (if properly motivated) can think creatively, while the AI cannot.

Current AI can approximate many things a human can do, but at a lower quality, and only by using the brute force of rapidly accessing a much larger source of data.

marvin · October 17, 2024 6:06PM

tht said:
solutions to this is just adding more data. More and more "correct" responses will be put into the training set, and makes this illusion of intelligence expand more and more. To the point that it will take hard work to find edge cases. Even di novo insight is something that can be done. At minimum, it's going to be brute forced. Arms race.

And this can make new discoveries:

https://www.technologyreview.com/2023/12/14/1085318/google-deepmind-large-language-model-solve-unsolvable-math-problem-cap-set/

While brute force seems different from what humans do, the sum total of human activity from billions of people every day is brute force. Many scientists working in parallel on the same problems, most failing and a handful succeeding and each has very focused training.

AI models are being trained on everything - jack of all trades, master of none. If they were trained like humans, train the best physics model on only physics, best medical model only on medical material, best art model only on art etc with some cross training, this may produce more reliable output.

The best training procedure and training material is a whole problem by itself. Humans follow convention, we all know what seasons are because we live through them, we live by the day/night cycle, eat and sleep at regular times, go through curated education, get jobs with very specific skill requirements and these define people's personalities. ChatGPT, Claude etc aren't mostly artists, engineers, doctors, they don't have a specialty.

Humans are fed with uncompressed, high quality, curated data for years in real-time with a processing engine tuned over thousands of years. This has been estimated around 70GB/day of data. This would mean an 8 year old has had over 200TB of high quality data and the most meaningful parts stored. The AI models are being trained on similar amounts but it's data that's a byproduct of what people do and contains a ton of garbage data. It's still amazing to see the kinds of answers it comes up with in such early stages of development.

They're also not meant to replace humans at everything but at least be very useful tools. Adobe showed off some impressive art features this week with Turntable and Scenic where they can rotate artwork and build more controlled generated scenes. This can reduce human work from hours/days to seconds.

Apple's study proves that LLM-based AI models are flawed because they cannot reason

Comments