You’re so insightful and wise. You have learned much from other viewpoints.
Log in | Sign up
- 1 Post
- 261 Comments
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·5 hours agoIt’s like you didn’t listen to anything I ever said, or you discounted everything I said as fiction, but everything your dear LLM said is gospel truth in your eyes. It’s utterly irrational. You have to be trolling me now.
… because this sign was made before the IBM PC was invented.
Language changes over time.
I think you’re missing sarcasm for insanity, and the reason that you’re doing that is that you were already belittling their viewpoint quite fiercely, rejecting absolutely everything they said just because you disagree with their conclusion.
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·20 hours agoit’s so good at parsing text and documents, summarizing
No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn’t matter, go ahead and use LLMs.
If you just want some ideas that you’re going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you’re using LLMs effectively.
But if you’re trusting it, you’re doing it very, very wrong and you’re going to get humiliated because other people are going to catch you out in repeating an LLM’s bullshit.
Log in | Sign up@lemmy.worldto Fuck AI@lemmy.world•The shoehorning of llms into business use is just laziness.1·1 day agoI have a similar story, but it got worse and worse with the lies as it got through the table. I fought it for an hour, then I wrote a script instead.
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·1 day agoYou’re better off asking one human to do the same task ten times. Humans get better and faster at things as they go along. Always slower than an LLM, but LLMs get more and more likely to veer off on some flight of fancy, further and further from reality, the more it says to you. The chances of it staying factual in the long term are really low.
It’s a born bullshitter. It knows a little about a lot, but it has no clue what’s real and what’s made up, or it doesn’t care.
If you want some text quickly, that sounds right, but you genuinely don’t care whether it is right at all, go for it, use an LLM. It’ll be great at that.
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·1 day agoI would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it’s weird.
The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don’t trust the LLM. Check every fucking thing.
In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That’s how formulaic it was. I regretted deeply trying to get an LLM to use data.
The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn’t matter how firmly or how often you ask it to be accurate or use the input carefully. It’s going to lie to you before long. It’s an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don’t trust it to tell you the truth any more than you would trust Donald J Trump to.
How can I subscribe to piefed.world users and communities etc from my lemmy.world account?
How do I subscribe to a user or community on piefed.world and see it in my lemmy.world feed?
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·1 day agoWhereas if you ask a human to do the same thing ten times, the probability that they get all ten right is astronomically higher than 0.0000059049.
Log in | Sign up@lemmy.worldto Mildly Infuriating@lemmy.world•Car crashes have killed and seriously injured roughly the same number of people as shootings in Chicago this year. Only one of these things draws media attention. English2·1 day agoI agree it was a dumb comparison to start off with.
I wasn’t the one who made it, but the license issue is the logical conclusion if OP insists on the comparison.
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·1 day agoAgain with dismissing the evidence of my own eyes!
I wasn’t asking it to do calculations, I was asking it to put the data into a super formulaic sentence. It was good at the first couple of rows then it would get stuck in a rut and start lying. It was crap. A seven year old would have done it far better, and if I’d told a seven year old that they had made a couple of mistakes and to check it carefully, they would have done.
Again, I didn’t read it in a fucking article, I read it on my fucking computer screen, so if you’d stop fucking telling me I’m stupid for using it the way it fucking told me I could use it, or that I’m stupid for believing what the media tell me about LLMs, when all I’m doing is telling you my own experience, you’d sound a lot less like a desperate troll or someone who is completely unable to assimilate new information that differs from your dogma.
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish10·2 days agoWow. 30% accuracy was the high score!
From the article:Testing agents at the office
For a reality check, CMU researchers have developed a benchmark to evaluate how AI agents perform when given common knowledge work tasks like browsing the web, writing code, running applications, and communicating with coworkers.
They call it TheAgentCompany. It’s a simulation environment designed to mimic a small software firm and its business operations. They did so to help clarify the debate between AI believers who argue that the majority of human labor can be automated and AI skeptics who see such claims as part of a gigantic AI grift.
the CMU boffins put the following models through their paces and evaluated them based on the task success rates. The results were underwhelming.
⚫ Gemini-2.5-Pro (30.3 percent)
⚫ Claude-3.7-Sonnet (26.3 percent)
⚫ Claude-3.5-Sonnet (24 percent)
⚫ Gemini-2.0-Flash (11.4 percent)
⚫ GPT-4o (8.6 percent)
⚫ o3-mini (4.0 percent)
⚫ Gemini-1.5-Pro (3.4 percent)
⚫ Amazon-Nova-Pro-v1 (1.7 percent)
⚫ Llama-3.1-405b (7.4 percent)
⚫ Llama-3.3-70b (6.9 percent),
⚫ Qwen-2.5-72b (5.7 percent),
⚫ Llama-3.1-70b (1.7 percent)
⚫ Qwen-2-72b (1.1 percent).“We find in experiments that the best-performing model, Gemini 2.5 Pro, was able to autonomously perform 30.3 percent of the provided tests to completion, and achieve a score of 39.3 percent on our metric that provides extra credit for partially completed tasks,” the authors state in their paper
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·2 days agoWhy are you giving it data
Because there’s a button for that.
It’s output is dependent on the input
This thing that you said… It’s false.
Log in | Sign up@lemmy.worldto Mildly Infuriating@lemmy.world•Car crashes have killed and seriously injured roughly the same number of people as shootings in Chicago this year. Only one of these things draws media attention. English6·2 days agoIf guns are so alike to cars, why not require a license that you get by passing a written test on gun safety and a practical test on basic competence and safe usage?
Log in | Sign up@lemmy.worldto Technology@lemmy.world•AI agents wrong ~70% of time: Carnegie Mellon studyEnglish1·2 days agoIt’s not completely random, but I’m telling you it fucked up, it fucked up badly, time after time, and I had to check every single thing manually. It’s correctness run never lasted beyond a handful. If you build something using some equation it invented you’re insane and should quit engineering before you hurt someone.
Log in | Sign up@lemmy.worldto Lemmy Shitpost@lemmy.world•What sort of grill needs a firmware update lol11·2 days agoThe same kind of grill that can be bricked remotely if you stop paying for software updates.
I already told you my experience of the crapness of LLMs and even explained why I can’t share the prompt etc. You clearly weren’t listening or are incapable of taking in information.
There’s also all the testing done by the people talked about in the article we’re discussing which you’re also irrationally dismissing.
You have extreme confirmation bias.
Everything you hear that disagrees with your absurd faith in the accuracy of the extreme blagging of LLMs gets dismissed for any excuse you can come up with.