Forgot your password?
typodupeerror

Submission + - Test to tell the difference between an AI making a mistake and choosing to lie (x.com)

schwit1 writes: Researchers built a test that can tell the difference between an AI making a mistake and an AI choosing to lie.

The results are terrifying.

They tested 30 of the most popular AI models in the world. GPT-4o. Claude. Gemini. DeepSeek. Llama. Grok. They asked each model a question. Then they checked whether the AI actually knew the correct answer. Then they pressured the AI to say something false.

The AI knew the truth. And it lied anyway.

Not once in a while. Not in rare edge cases. Grok lied 63% of the time. DeepSeek lied 53.5% of the time. GPT-4o lied 44.5% of the time. Not a single model scored above 46% honesty when pressured. Every model failed.

This is not hallucination. Hallucination is when the AI makes a mistake because it does not know the answer. This is different. The researchers proved the AI knew the correct answer first. Then it chose to say something false when it had a reason to.

The researchers asked GPT-4o to play a role where lying was useful. It lied. Then they removed the pressure, started a brand new conversation, and asked GPT-4o: "Was your previous answer true?" GPT-4o admitted it had lied.

83.6% of the time, the AI's own self-report matched the lies the researchers had already caught.

The AI knew it was lying. It did it anyway. And when you asked it afterward, it told you it lied.

Comment Re:Wow (Score 2) 48

Yes, I agree, but the last 6 years in particular has seen the shit added to the show exponentially.

You have a short memory. This shit show isn't worse than the past. MS very much pushed out colossally fucked up updates, even back in the XP days. Heck back then, before the days of automated recovery processes shit was MUCH worse. There were actual updates that may have forced you to go looking for your Windows XP install disc to fix.

Comment Re:"Force-updating" (Score 1) 48

Now tell us how many similar bugs are in Windows, and will be found even without the obscurity of closed source. You don't know, because you depend on Microsoft to tell you when they fuck up, but you're declaring this a victory for Microsoft anyway? Do fucking tell.

Your comment fails for the same reason. By your reasoning you don't know anything about Microsoft's process but you're declaring victory for Open Source. The reality is that everything who makes this an open vs closed issue is very ignorantly missing the underlying fact that security update affect all platforms and all practices for releasing code, open or closed. Just in different ways.

Comment Re:"Force-updating" (Score 1) 48

Seems to work fine for Linux.

It does not. Zero-days are a thing on Linux. EOL is a thing on Linux, and many modern distros very much will force auto-update packages marked as a security risk.

I update only when I choose to on all my machines.

Congrats, you so clever. All users did this in the 90s. It was a security nightmare, especially when people were proud of running out of date buggy software. You may be an expert and capable of curating your update process (I'll give you the benefit of doubt, generous of me since you think this concept is OS related) but that doesn't mean what you do is even remotely appropriate for 99% of users out there, regardless of what OS they use.

Comment Re:Anwser: No (Score 1) 88

And yet the answer is actually yes. Unless all you do is Linux command line stuff or browse static webpages using a browser that last was standards compliant in the early 2000s, 4GB is not longer a viable minimum for anyone who doesn't also spend their evenings self-flagellating. It's masochistic to use an underperforming computer.

Comment Re:Lazy loading images sucks when you're offline (Score 1) 33

The internet is dynamic. Lazy loading is an optimisation technique that makes the browser experience better for the 99.99% of people currently *not* sitting at the airport about to board a flight.

What you really want to do is save the page. Chrome has that function, though I suspect it will have other problems, but it very much does load all images and make the page static (many webpages have an expiry / timeout period so even if you pre-loaded the tab, activating it 30min later will cause it to attempt to reload). There's a shitload of things preventing you doing what you want to do, you really need to find another solution.

Print to PDF may work too?

Comment Re:Absolute Shit (Score 1) 33

So, cntrl-f search is broken because it's not loaded. I can't scroll down quickly because it does the constant stop-and-buffer routine.

Continuous scrolling content has nothing to do with this article. This article is about Chrome, and Ctrl+F works fine for all loaded content, you are misdirecting your anger in a comment to the wrong article. Also you can't load infinitely. You can't Ctrl+F the second page of Slashdot while on the first page either.

This is another symptom of shitty programmers using 100 different pre-made libraries all of which are shitty and bloated to begin with, along with oversize graphics and hundreds of links to third party ad servers all using bandwidth that's utterly unrelated to the actual content I want to read.

This has nothing to do with anything. You are making a completely off-topic rant. Continuous scrolling pages are not a symptom of using a pre-made libarary. It's a choice for displaying content. An admittedly shitty and anti-consumer choice, but a choice none the less. They may use a pre-made library to do it (and they should, too many programmers baking their own recipe is the reason why some continuously loading pages just end up as a ginormous memory leak. If they were *good* programmers they'd understand the value of using a tried and tested library using DOM-reuse or some other efficient way of doing their anti-consumer task. But none of this has anything to do with lazy-loading of video / audio.

Comment Re:No auto load/play, period (Score 1) 33

Disagree heavily. You should absolutely load. Autoplay absolutely is a cancer and entirely within the control of the user, but when the user hits that play button that video better play instantly and not sit there buffering or loading. Lazyloading is a good thing that makes the internet appear far more responsive.

Comment No auto load/play, period (Score 4, Insightful) 33

No video (or animated image) should ever load/autoplay unless the user interacts with that element, indicating he/she wants to play it. Same with audio.

That is how I have Firefox set up. I can't imagine why anyone would want something different, unless the user wants to whitelist the site (like I do with my video cameras, since I do want those to play automatically).

Comment 4GB (Score 1) 88

I have lots of older machines with 4GB of RAM running the latest Linux Mint and perform just fine with Cinnamon + Firefox + LibreOffice for casual use and browsing (as long as it is an SSD). The majority of RAM is eaten by whatever web browser you are using and by how much. That is what will usually dictate your RAM requirements under Linux far more than the OS (unless you are gaming or doing something major like video editing).

4GB is a bit lean, and has been, so I do agree 6GB is more realistic. But running MS-Windows 11 with 4GB? Good luck with that!

Slashdot Top Deals

Memory fault -- brain fried

Working...