Chat GPT appears to hallucinate or outright lie about everything

Buttflapper@lemmy.world · 10 months ago

Chat GPT appears to hallucinate or outright lie about everything

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.net · edit-2 10 months ago

Imagine text gen AI as just a big hat filled with slips of paper and when you ask it for something, it’s just grabbing random shit out of the hat and arranging it so it looks like a normal sentence.

Even if you filled it with only good information, it will still cross those things together to form an entirely new and novel response, which would invariably be wrong as it mixes info about multiple subjects together even if all the information individually was technically accurate.

They are not intelligent. They aren’t even better than similar systems that existed before LLMs!

db0@lemmy.dbzer0.com · edit-2 10 months ago

Do not expect anything factual from llms. This is the wrong use case. You can role play with them if you guide them sufficiently and they can help with sone tasks like programming if you already know what you want but want to save time writing it, but anything factual is out of their scope.

JustAnotherKay@lemmy.world · 10 months ago

If you already know what you want but want to save time writing it

IME, going to ChatGPT for code usually meant losing time, cause I’d go back and forth trying to get a usable snippet and it would just keep refactoring the same slop that didn’t work in its first attempt

thebestaquaman@lemmy.world · 10 months ago

In general I agree: ChatGPT sucks at writing code. However, when I want to throw together some simple stuff in a language I rarely write, I find it can save me quite some time. Typical examples would be something like

“Write a bash script to rename all the files in the current directory according to <pattern>”, “Give me a regex pattern for <…>”, or “write a JavaScript function to do <stupid simple thing, but I never bothered to learn JS>”

Especially using it as a regex pattern generator is nice. It can also be nice when learning a new language and you just need to check the syntax for something- often quicker than swimming though some Geeks4Geeks blog about why you should know how to do what you’re trying to do.

JustAnotherKay@lemmy.world · 10 months ago

Using an AI as a regex checker is so smart and I’m mad it never occured to me that it was possible lol. I’ve just been pouring over random forum posts for it

db0@lemmy.dbzer0.com · 10 months ago

The free version is pretty braindead nowadays. Early on it was quite better.

CoggyMcFee@lemmy.world · 10 months ago

When I have it integrated into my development environment a la Copilot, predicting the next block of code I’m going to write (which I can use if it is relevant and ignore if not), I find it to be a huge timesaver.

oakey66@lemmy.world · 10 months ago

Same experience. It can serve as a starting point but usually I have to sift through so many bad answers until something usable is made available.

subignition@piefed.social · edit-2 10 months ago

They’re pretty reasonable for consensus-based programming prompts as well like “Compare and contrast popular libraries for {use case} in {language}” or “I want to achieve {goal/feature} in {summary of project technologies}, what are some ways I could structure this?”

Of course you still shouldn’t treat any of the output as factual without verifying it. But at least in the former case, I’ve found it more useful than traditional search engines to generate leads to look into, even if I discard some or all of the specific information it asserts

Edit: Which is largely due to traditional search engines getting worse and worse in recent years, sadly

sircac@lemmy.world · 10 months ago

What would you expect from a word predictor, a knife is mostly useless for nailing, you are using them for the wrong purpose…

TimeSquirrel@kbin.melroy.org · 10 months ago

What? You don’t have a set of cutting hammers in the kitchen?

Christer Enfors@lemm.ee · 10 months ago

Lol, of course not.

… my cutting hammers are in the bathroom.

subignition@piefed.social · 10 months ago

Damn, you’re living in the future. I’m still stuck using three shells.

Dasus@lemmy.world · 10 months ago

“Converted what I said into the truth”

Now I’m not against the point you’re making in any way, I think the bots are hardcore yes men.

Buut… I have a 1060 and I got it around when No Man’s Sky came out, and I did try it on my 4k LED TV. It did run, but it also stuttered quite a bit.

Now I’m currently thinking of updating my card, as I’ve updated the rest of the PC last year. A 3070 is basically what I’m considering, unless I can find a nice 4000 series with good VRAM.

My point here being that this isn’t the best example you could have given, as I’ve basically had that conversation several times in real life, exactly like that, as “it runs” is somewhat subjective.

LLM’s obviously have trouble with subjective things, as we humans do too.

But again, I agree with the point you’re trying to make. You can get these bots to say anything. It amused me that the blocks are much more easily circumvented just by telling them to ignore something or by talking hypothetically. Idk but at least very strong text based erotica was easy to get out of them last year, which I think should not have been the case, probably.

webghost0101@sopuli.xyz · 10 months ago

This is an issue with all models, also the paid ones and its actually much worse then in the example where you at least expressed not being happy with the initial result.

My biggest road block with AI is that i ask a minor clarifying question. “Why did you do this in that way?” Expecting a genuine answer and being met with “i am so sorry here is some rubbish instead. “

My guess is this has to do with the fact that llms cannot actually reason so they also cannot provide honest clarification about their own steps, at best they can observe there own output and generate a possible explanation to it. That would actually be good enough for me but instead it collapses into a pattern where any questioning is labeled as critique with logical follow up for its assistant program is to apologize and try again.

Tellore@lemmy.world · 10 months ago

I’ve also had similar problem, but the trick is if you ask it for clarifications without it sounding like you imply them wrong, they might actually try to explain the reasoning without trying to change the answer.

webghost0101@sopuli.xyz · 10 months ago

I have tried to be more blunt with an underwhelming succes.

It has highlighted some of my everyday struggles i have with neurotypicals being neurodivergent. There are lots of cases where people assume i am criticizing while i was just expressing curiosity.

vxx@lemmy.world · 10 months ago

I think we shouldn’t expect anything other than language from a language model.

linearchaos@lemmy.world · 10 months ago

I don’t want to sound like an AI fanboy but it was right. It gave you minimum requirements for most VR games.

No man Sky’s minimum requirements are at 1060 and 8 gigs of system RAM.

If you tell it it’s wrong when it’s not, it will wake s*** up to satisfy your statement. Earlier versions of the AI argued with people and it became a rather sketchy situation.

Now if you tell it it’s wrong when it’s wrong, It has a pretty good chance of coming back with information as to why it was wrong and the correct answer.

VinS@sh.itjust.works · 10 months ago

Well I asked some questions yesterday about classes of DAoC game to help me choose a starter class. It totally failed there attributing skills to wrong class. When poking it with this error it said : you are right, class x don’t do Mezz, it’s the speciality of class Z.

But class Z don’t do Mezz either… I wanted to gain some time. Finally I had to do the job myself because I could not trust anything it said.

linearchaos@lemmy.world · 10 months ago

God I loved DAoC, Play the hell of it back in it’s Hey Day.

I can’t help but think it would have low confidence on it though, there’s going to be an extremely limited amount of training data that’s still out there. I’d be interested in seeing how well it fares on world of Warcraft or one of the newer final fantasies.

The problem is there’s as much confirmation bias positive is negative. We can probably sit here all day and I can tell you all the things that it picks up really well for me and you can tell me all the things that it picks up like crap for you and we can make guesses but there’s no way we’ll ever actually know.

VinS@sh.itjust.works · 10 months ago

I like it for brainstorming while debbuging, finding funny names, creating stories “where you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week ago, good community)

filister@lemmy.world · edit-2 10 months ago

And you as an analytics engineer should know that already? I am using some LLMs on almost a daily basis, Gemini, OpenAI, Mistral, etc. and I know for sure that if you ask it a question about a niche topic, the chances for the LLM to hallucinate are much higher. But also to avoid hallucinating, you can use different prompt engineering techniques and ask a better question.

Another very good question to ask an LLM is what is heavier one kilogram of iron or one kilogram of feathers. A lot of LLMs are really struggling with this question and start hallucinating and invent their own weird logical process by generating completely credibly sounding but factually wrong answers.

I still think that LLMs aren’t the silver bullet for everything, but they really excel in certain tasks. And we are still in the honeymoon period of AIs, similar to self-driving cars, I think at some point most of the people will realise that even this new technology has its limitations and hopefully will learn how to use it more responsibly.

bane_killgrind@slrpnk.net · 10 months ago

They seem to give the average answer, not the correct answer. If you can bound your prompt to the range of the correct answer, great

If you can’t bind the prompt it’s worse than useless, it’s misleading.

paraphrand@lemmy.world · 10 months ago

Those first set of specs it quoted are actually the original min specs that Oculus and Valve promoted for the Rift and Vive when they were new.

Ever since then there have not been new “official” min specs. But it’s true that higher spec if better and that newer headsets are higher res and could use higher spec stuff.

Also, a “well actually” on this would be that those are the revised min specs that were put out a few years after the initial specs. It use to be a GTX 970 was min spec. But they changed that to the 1060.

What is failing here is the model actually being smart. If it was smart it would have reasoned that time moves on and it would have considered better mins pecs for current hardware. But instead it just regurgitated the min specs that were once commonly quoted by Oculus/Meta and Valve.

ipkpjersi@lemmy.ml · edit-2 10 months ago

I could have said anything, and then it would have agreed with me

Nope, I’ve had it argue with me, and I kept arguing my point but it kept disagreeing, then I realized I was wrong. I felt stupid but I learned from it.

It doesn’t “know” anything but that doesn’t mean that it can’t be right.

WolfLink@sh.itjust.works · 10 months ago

It’s actually not really wrong. There are many VR games you can get away with low specs for.

Yes when you suggested a 3070 it just took that and rolled with it.

It’s basically advanced autocomplete, so when you suggest a 3070 it thinks the best answer should probably use a 3070. It’s not good at knowing when to say “no”.

Interesting it did know to come up with a newer AMD card to match the 3070, as well as increasing the other specs to more modern values.

Petter1@lemm.ee · 10 months ago

For such questions you need to use a LLM that can search the web and summarise the top results in good quality and shows what sources are used for which parts of the answer. Something like copilot in bing.

emmy67@lemmy.world · 10 months ago

Or, the words “i don’t know” would work

Petter1@lemm.ee · 10 months ago

I don’t think LLM can do that very well, since there are very little people on the internet admitting that they don’t know about anything 🥸😂

Funny thing is, that the part of the brain used for talking makes things up on the fly as well 😁 there is great video from Joe about this topic, where he shows experiments done to people where the two brain sides were split.

https://youtu.be/_TYuTid9a6k?si=PylqvQ24QHWw_6PN

emmy67@lemmy.world · 10 months ago

Funny thing is, that the part of the brain used for talking makes things up on the fly as well 😁 there is great video from Joe about this topic, where he shows experiments done to people where the two brain sides were split.

Having watched the video. I can confidently say you’re wrong about this and so is Joe. If you want an explanation though let me know.

Petter1@lemm.ee · edit-2 10 months ago

Yes please! Hope you commented that on Joe‘s Video so he can correct himself in a coming video

SomeGuy69@lemmy.world · 10 months ago

People would move to the competition LLM that does always provide a solution, even if it’s wrong more often. People are often not as logical and smart as you wish.

r_se_random@sh.itjust.works · 10 months ago

The copilot app doesn’t seem to be any better.

r_se_random@sh.itjust.works · 10 months ago

Petter1@lemm.ee · 10 months ago

At least it gives you links to validate the info it serves you I’d say. LLM can do nothing about bad search results, the search algorithm works a bit different and is its own machine learning process.

But I just recognised, that chatGPT as well can search the web, if you prompt in the right way, and then it will give you the sources as well

r_se_random@sh.itjust.works · 10 months ago

But that also discredits me from ever asking an LLM a question which I don’t already know the answer to. If I have to go through the links to get my info, we already have search engines for it.

The entire point of LLM with Web search was to summarise the info correctly which I have seen them fail at, continuously and hilariously.

Petter1@lemm.ee · 10 months ago

Yea, but I prefer just writing what I am thinking instead of keywords. And more often than not, it feels like I get to answer more quickly as if I just used a search engine. But of course, I bet there are multiple people, that find stuff faster on web search engines, than me with LLM, it is just for me the faster way to find what I search.

Kazumara@discuss.tchncs.de · 10 months ago

It did not simply analyze the best type of graphics card for the situation.

Yes it certainly didn’t: It’s a large language model, not some sort of knowledge engine. It can’t analyze anything, it only generates likely text strings. I think this is still fundamentally misunderstood widely.

leftzero@lemmynsfw.com · 10 months ago

I think this is still fundamentally misunderstood widely.

The fact that it’s being sold as artificial intelligence instead of autocomplete doesn’t help.

Or Google and Microsoft trying to sell it as a replacement for search engines.

It’s malicious misinformation all the way down.

Christer Enfors@lemm.ee · 10 months ago

Agreed. As far as I know, there is no actual artificial intelligence yet, only simulated intelligence.

Brkdncr@lemmy.world · 10 months ago

TIL ChatGPT is taking notes off my ex.

Oka@sopuli.xyz · 10 months ago

If I narrow down the scope, or ask the same question a different way, there’s a good chance I reach the answer I’m looking for.

https://chatgpt.com/share/ca367284-2e67-40bd-bff5-2e1e629fd3c0