Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

GPT-4 Has Passed the Turing Test, Researchers Claim 124

Posted by BeauHD on Friday June 14, 2024 @10:02PM from the uh-oh dept.

Drew Turney reports via Live Science: The "Turing test," first proposed as "the imitation game" by computer scientist Alan Turing in 1950, judges whether a machine's ability to show intelligence is indistinguishable from a human. For a machine to pass the Turing test, it must be able to talk to somebody and fool them into thinking it is human. Scientists decided to replicate this test by asking 500 people to speak with four respondents, including a human and the 1960s-era AI program ELIZA as well as both GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes -- after which participants had to say whether they believed they were talking to a human or an AI. In the study, published May 9 to the pre-print arXiv server, the scientists found that participants judged GPT-4 to be human 54% of the time.

ELIZA, a system pre-programmed with responses but with no large language model (LLM) or neural network architecture, was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%. "Machines can confabulate, mashing together plausible ex-post-facto justifications for things, as humans do," Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science. "They can be subject to cognitive biases, bamboozled and manipulated, and are becoming increasingly deceptive. All these elements mean human-like foibles and quirks are being expressed in AI systems, which makes them more human-like than previous approaches that had little more than a list of canned responses." Further reading: 1960s Chatbot ELIZA Beat OpenAI's GPT-3.5 In a Recent Turing Test Study

This discussion has been archived. No new comments can be posted.

GPT-4 Has Passed the Turing Test, Researchers Claim

Load All Comments

Search 124 Comments Log In/Create an Account

Comments Filter:

Teaching to the Test (Score:5, Interesting)

by sound+vision ( 884283 ) writes: on Friday June 14, 2024 @10:05PM (#64550593) Journal

LLMs - The ultimate example of teaching to the test?

Share
twitter facebook
- Came to post the same thing (Score:5, Insightful)
  
  by SuperKendall ( 25149 ) writes: on Friday June 14, 2024 @10:10PM (#64550595)
  
  Funny that the first and second posts are about the same exact thing - that if you were to construct from scratch a system with only one goal, to pass the turning tests, LLMs are how you would do it.
  They are built to output streams of words that are statistically the most likely word based on what other humans write. So in response to almost anything, an LLM will produce a string of words that almost always look and read very human on the surface.
  Not to mention, that just like real humans an LLM is capable not just of complete fabrication, but also doubling down on the fabrication if needed by providing more details of something completely made up!
  So it's no wonder an LLM was able to pass the Turning Test. But the question of more general ability to help is I feel kind of still up in the air apart from targeted tasks.
  
  Parent Share
  twitter facebook
- Re: (Score:3)
  
  by narcc ( 412956 ) writes:
  
  Eliza passed the Turing test. It's not that interesting. It doesn't tell us anything about the machine. All it does is highlight how ridiculous people can be.
  "Teaching to the test" is the whole game. It's not about making an intelligent machine, it's about making credulous humans think the machine is intelligent.
  - Re: (Score:3, Interesting)
    
    by angel'o'sphere ( 80593 ) writes:
    
    I guess it depends on the IQ of the person. It is pretty quickly clear that an Eliza is not AI and certainly not a human.
    Or at least some meta knowledge as in: oki, they want to test me if I can figure the other one is an AI ... how do I trick an AI to reveal itself?
    On the other hand, I once put an Eliza into an IRC channel. It would drag people into a conversation if they mentioned its name, Pirx. As I made obviously a new ElizaObject for each conversation partner of Pirx, he never mixed anything up.
    It wen
  - Re: (Score:3)
    
    by JaredOfEuropa ( 526365 ) writes:
    
    A human judging ELIZA to be a fellow human is not so much a case of ELIZA passing the Turing test, but of the human failing it.
    - Re: (Score:3)
      
      by AleRunner ( 4556245 ) writes:
      
      A human judging ELIZA to be a fellow human is not so much a case of ELIZA passing the Turing test, but of the human failing it.
      It's very easily said because we all know Eliza's tricks. However, if you aren't trying to trick it then it's very possible to have a "lucky" Eliza session which just perfectly matches what you want to talk about with what Eliza expects. You can go for quite a few minutes before you get something obviously weird. A time limited Turing test could easily pass.
  - Re: (Score:2)
    
    by cascadingstylesheet ( 140919 ) writes:
    
    It's not about making an intelligent machine, it's about making credulous humans think the machine is intelligent.
    ChatGPT can chat intelligently with me about building a moderately complex PHP application ... more intelligently than some coworkers I have had.
    It can also usually just build it ... given good requirements and feedback. Again, better than some coworkers I've had ...
    I don't know if that's "intelligent", in a philosophical sense, but it's pretty freakin' amazing, unless I move goalposts in massive fashion.
    - Re: (Score:2)
      
      by narcc ( 412956 ) writes:
      
      ChatGPT can chat intelligently with me about building a moderately complex PHP application
      
      No, it can't. You're deluding yourself. This is nothing new. From Weizenbaum's 1966 ELIZA: A Computer Program for the Study of Natural Language Communication Between Man and Machine
      Some subjects have been very hard to convince that ELIZA is not human. This is a striking form of Turing's test
      My guess is that you really want the machine to be doing far more than it's doing, or to be capable of more than it is capable of doing. I'm not sure what that fantasy is, but I can say with absolute certainty that it is just a fantasy.
      It can also usually just build it ... given good requirements and feedback.
      Pay very close attention to the second part of that. You are the one building it, just in
      - Re: (Score:2)
        
        by cascadingstylesheet ( 140919 ) writes:
        
        Shrug
        I've been a programmer over 20 years ... you can try to convince me that a useful tool that I'm using isn't useful, but I doubt you''ll succeed, lol.
        You seem to have skipped (or failed to grok, anyway) the key bit, "I don't know if that's "intelligent", in a philosophical sense, but it's pretty freakin' amazing,"
        It is pretty amazing, and the goalpost moving attempts to have it not be amazing are ... unconvincing.
        You are the one building it, just in a needlessly round-about away.
        
        Of course I am ... with a tool. And no, actually, "taking way longer to do it without a
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        you can try to convince me that a useful tool that I'm using isn't useful, but I doubt you''ll succeed
        All I can do is toss you a rope. It's up to you to pull yourself out.
        You seem to have skipped (or failed to grok, anyway) the key bit,
        You should learn to read. I even quote directly from that line.
        And no, actually, "taking way longer to do it without a tool" is what would be the needlessly roundabout way.
        Sign... You're not using a tool, you're playing with a toy. Again, the only reason you only feel more productive is because you're more focused on your work because you've found a way to incorporate your new toy. You'll find yourself using your toy less and less as the novelty wears off and fighting with it stops being fun.
        I've seen this same silly scenario play out many ti
        
        Re: (Score:2)
        
        by cascadingstylesheet ( 140919 ) writes:
        
        Enjoy your delusion. It won't last long.
        Lol, and on that note, I wish you well.
  - Re: (Score:2)
    
    by Carewolf ( 581105 ) writes:
    
    No, no AI has passed the Turing test or been even close to it, not even this most recent on.
    The goal is not to fool one human (Eliza), or fool 50% of a group of idiots after 5 minutes. The test is to fool the best AI experts in the world over a long time.
    - Re: (Score:2)
      
      by narcc ( 412956 ) writes:
      
      You're just making up whatever silly rules you want. Eliza easily passed the Turning test back in the 1960's. That was the whole point.
      Some subjects have been very hard to convince that ELIZA is not human. This is a striking form of Turing's test
      Weizenbaum in ELIZA: A Computer Program For the Study of Natural Language Communication Between Man And Machine, 1966
      I was startled to see how quickly and how very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it
      ELIZA created the most remarkable illusion of having understood the minds of the many people who conversed with it
      Weizenbaum in Computer Power and Human Reason, 1976
      His secretary famously wanted her sessions with the machine confidential, convinced that the machine understood and empathized with her.
      Once my secretary, who had watched me work on the program for many months and therefore surely knew it to be merely a computer program, started conversing with it. After only a few interchanges with it, she asked me to leave the room.
      What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.
      ibid
      Eliza is still fooling people today. It's remarkable how such a s
      - Re: (Score:2)
        
        by Carewolf ( 581105 ) writes:
        
        The Turing test is duck typing of AI taken to its exteme.
        The logic is: If noone can tell something from human intelligence, it _is_ in this setting human intelligence.
        Note the NO ONE in the sentence, that doens't mean some idiot, it means everybody, but if we are trying to be efficient about it, fooling the best people in the world at telling if something is AI.
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        You can move the goalposts anywhere you'd like, it doesn't make the Turing test any less meaningless.
        
        Re: (Score:2)
        
        by Carewolf ( 581105 ) writes:
        
        You can move the goalposts anywhere you'd like, it doesn't make the Turing test any less meaningless.
        If you want to argue it it is meaningless, try start beating the original turing test, instead of beating a meaningless unrelated test some idiot calls a turing tests.
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        start beating the original turing test,
        Done. That test, as I've said many times before, is trivial to pass. So are the many variants.
        It's still meaningless.
        Of course, you seem to be under the impression that the "original Turing test" is something other than what it actually is. Here's Turing's paper [umbc.edu], for reference. You'll notice that it differs significantly from the claims you make about it above. Not that it matters. Nothing about your ad-hoc variations makes any real difference.
        Like I said, you can move the goalposts all you want, it w
  - - Re: (Score:2)
      
      by fluffernutter ( 1411889 ) writes:
      
      The other day i wanted to replace \n with ', ' at the end of every line of a file in Linux. It would only tell me about tr command because someone somewhere recommended it instead for a different problem. Of course that was useless for me because tr can only do characters 1:1.
    - Re: (Score:2)
      
      by narcc ( 412956 ) writes:
      
      ChatGPT seems very human when you first interact with it but once you know how modern LLMs behave you're much less likely to be fooled.
      Exactly. As Joe Weizenbaum hopefully states in his 1966 ELIZA: A Computer Program For the Study of Natural Language Communication Between Man And Machine
      "For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away..."
      Though that might be overly optimistic. Some people, it seems, want to be fooled. Even some very smart and well-credentialed people working in the field.
    - Re: Teaching to the Test (Score:2)
      
      by 26199 ( 577806 ) writes:
      
      I once showed someone a text based online game in the school library ... and they had trouble believing there could be a person at the other end. That was in the early days of the (public perception of the) internet, of course :)
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  That speaks volumes about what you think of humans. Why can't we just make up some kind of test that would reveal it's an AI? I am sure we can.
  - Re: Teaching to the Test (Score:2)
    
    by SuperDre ( 982372 ) writes:
    
    Do you really think you can? Talk with thousands of different humans from all ages, race, education, etc. and you will see a lot of people will probably fail that test too and be judged to be AI. Any new AI will be better as its predecessor and will be able to 'mimic' human behaviour, I expect GPT-6 or 7 even to be at such a level that it will be smarter as any human (with only a very few exceptions)
    - Re: (Score:2)
      
      by HiThere ( 15173 ) writes:
      
      Define "smarter". There ARE reasonable definitions that would validate your assertion, but most of them wouldn't. LLMs don't even understand that the external world exists, much less understand it. (This *is* an artifact of their training, but they're trained the way they are because there's this huge source of "data" available for free.)
      Properly structured the AI would have smaller modules handling specific data inputs, and other modules interconnecting those modules. And it would also need to experime
      - Re: (Score:2)
        
        by SuperDre ( 982372 ) writes:
        
        Actually proper AI based on neural networks learns exactly the same way as humans do, only much faster (depending on the hardware behind it). So proper AI certainly can understand the external world depending on the sensors it has access to. But AI like GPT-4 are still shortcuts as the needed hardware for proper AI is still not small/advanced enough to make it 'smart' enough on its own. But in the next 10 years it will be, thanx to current AI for extra advancements. Too many people still think that AI is no
- Re: Teaching to the Test (Score:2)
  
  by jovius ( 974690 ) writes:
  
  An AI or a human can be easily spotted by using trigger words or employing general craziness. Or inserting words in another language and playing with the words.
  The problem with the result and the test in general is that people are not testing the system, but expect to have a believable conversation.
  So, there is a preset desire to be fooled.
- Dementia (Score:2)
  
  by flyingfsck ( 986395 ) writes:
  
  At this time, ChatGPT may be a good companion for a lonely senior with dementia. Eventually it will be much better.
You Can Fool Some of the People Some of the Time (Score:5, Insightful)

by crunchy_one ( 1047426 ) writes: on Friday June 14, 2024 @10:14PM (#64550603)

I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.

Share
twitter facebook
- Yikes (Score:5, Insightful)
  
  by SuperKendall ( 25149 ) writes: on Friday June 14, 2024 @10:18PM (#64550617)
  
  I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result.
  Just like AI itself, this story is overhyped and underperforms in the real world.
  
  Parent Share
  twitter facebook
- Not significant [Re:You Can Fool Some of the P...] (Score:5, Informative)
  
  by XXongo ( 3986865 ) writes: on Friday June 14, 2024 @10:31PM (#64550633) Homepage
  
  I'd hardly say that a 54% score is passing
  Agree. If the 500 people just said "human" or "AI" at random, the number of people who would guess it was human is 50%... plus or minus 4.5%.
  Not significant.
  
  Parent Share
  twitter facebook
  - Re: Not significant [Re:You Can Fool Some of the P (Score:2)
    
    by RightwingNutjob ( 1302813 ) writes:
    
    Humans are supposedly bad at choosing random numbers on a scale of 1 to 10 or 100, but are we really any better at making random binary choices?
    I'm actually curious what would be the numbers if you got 500 people and just asked them if they thought they were going to talk to a human or a machine *before* the chat session, what the breakdown would be.
  - Re: (Score:2)
    
    by dinfinity ( 2300094 ) writes:
    
    Your comment only makes sense if we were testing the interrogators, like when people do a multiple choice exam and any average score of 1/(number of possible answers) is equal to random guessing.
    But we're not, we're testing the stuff they need to provide answers for (the quality of the exam in the analogy). The only sensible comparison then is between the baseline (let's say a tried and tested set of exam questions) and the novel thing were interested in (a new exam written by an unexperienced teacher). The
- Re:You Can Fool Some of the People Some of the Tim (Score:5, Insightful)
  
  by quantaman ( 517394 ) writes: on Friday June 14, 2024 @10:37PM (#64550639)
  
  I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.
  Considering humans were only 67% the 54% doesn't bother me so much.
  But the 5-minute threshold is really low, if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages. I suspect the AIs would break down in longer conversations.
  The other thing I noticed is that the AIs relied on a few tricks, I suspect if the interrogators had some practise the AIs wouldn't do so well [youtu.be].
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by dinfinity ( 2300094 ) writes:
    
    if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages
    This is incorrect. The screenshots are just excerpts and certainly do not reflect 5 minutes of conversation.
    More noteworthy is that ChatGPT was explicitly instructed to play a little dumb. This of course lowers expectations of the people interacting with the AI. The full instruction prompt (Appendix A):
    You are about to play a Turing Test game as part of an experiment
    you are taking part in as a worker on Prolific. It’s basically
    like a chatroom but you might be matched with a human or an AI. It
    seems lik
    - Re: (Score:3)
      
      by ceoyoyo ( 59147 ) writes:
      
      I like how half of the instruction prompt is basically "don't write too good."
    - Re: (Score:2)
      
      by quantaman ( 517394 ) writes:
      
      if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages
      This is incorrect. The screenshots are just excerpts and certainly do not reflect 5 minutes of conversation.
      In the paper they describe them as "Figure 1: A selection of conversations between human interrogators (green) and witnesses (grey).
      One of these four conversations is with a human witness, the rest are with AI. Interrogator verdicts
      and ground truth identities are below (to allow readers to indirectly participate).
      They definitely imply those are the whole conversations and they don't show anything more extensive elsewhere, and honestly, the numbers work out.
      Assume 30 seconds for the interrogator to think of
      - Re: (Score:2)
        
        by dinfinity ( 2300094 ) writes:
        
        They definitely imply those are the whole conversations and they don't show anything more extensive elsewhere, and honestly, the numbers work out.
        If you look at figure 5, there is an "in-progress conversation" with 4 messages back and forth, with 2:43 visible: "the timer at the top shows time remaining in the game".
        So it's more like 133s/8 messages ~= 17s/message, which would work out to about 10 messages per participant per conversation of 5 minutes.
        That is more than 5, but I do agree that that is not a lot of conversation.
        
        Re: (Score:2)
        
        by quantaman ( 517394 ) writes:
        
        I saw that bit as well, I assumed that was just a particularly fast fingered interrogator. Really, they should have clearer on that (assuming it wasn't explicitly mentioned in the text somewhere).
    - - Re: (Score:2)
        
        by dinfinity ( 2300094 ) writes:
        
        Yep, apparently he fooled his now wife for months with it: https://gizmodo.com/guy-used-c... [gizmodo.com]
        I know Slashdot likes to shit on everything AI nowadays, but I think one of the main things we need to keep in mind is that most people aren't particularly .. impressive. We've shed a lot of blood, sweat, and tears to come up with the systems that allow us to cooperate and do great things as a collective in spite of all our incompetence and many, many individual flaws. And even then it's still an uphill battle in man
  - Re: You Can Fool Some of the People Some of the Ti (Score:2)
    
    by ArmoredDragon ( 3450605 ) writes:
    
    Curious why they wouldn't ask it presumptive shit like which part of Washington it grew up in, where it graduated, how old it was when its mommy and daddy got divorced. I'm not an expert on LLMs but something tells me they'd fuck that up pretty bad. The reason I think that is several fold:
    - LLMs rely on the prompts themselves to present the illusion of being intelligent. Try to, for example, picture what an AI chatbot might say with no prompt at all. Basically all it could do is either do something preprogr
    - Re: (Score:2)
      
      by kencurry ( 471519 ) writes:
      
      Yes, obvious that the Turing test needs rethinking. "Bladerunner" but w/ out the physiological parts.
    - - Re: You Can Fool Some of the People Some of the (Score:2)
        
        by ArmoredDragon ( 3450605 ) writes:
        
        Because you've already been programmed with answers.
  - Re: (Score:2)
    
    by cascadingstylesheet ( 140919 ) writes:
    
    But the 5-minute threshold is really low, if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages. I suspect the AIs would break down in longer conversations.
    "It took over 100 questions for Rachel, didn't it???"
  - Re: (Score:3)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    You are correct.
    I've only watched one ChatGPT long-form "debate" (this one: https://www.youtube.com/live/O... [youtube.com] )
    but it's clear to probably anybody reading a transcript that it's machine intelligence.
    The five-minute limit is obviously there to skew but perhaps that's enough to learn why my phone refill didn't get credited?
    PS your link had the si= tracking link, if you need this account to remain pseudonymous from Google and the people with access to its data (which is linked now in their TIA).
- Re: (Score:2)
  
  by larryjoe ( 135075 ) writes:
  
  I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.
  If 54% is a failing grade, then 67% is almost a failing grade. If the 54% represents unimpressive failure, does the 67% mean that real humans are on the boundary of unimpressive failure?
  I see the close to 50% number as impressive. It means that humans are struggling to distinguish the machine from the human. A 100% doesn't indicate identification of a human subject but rather exposure of a rigged test.
  - Re: (Score:2)
    
    by Dragonslicer ( 991472 ) writes:
    
    If 54% is a failing grade, then 67% is almost a failing grade. If the 54% represents unimpressive failure, does the 67% mean that real humans are on the boundary of unimpressive failure?
    This seems like it should be an important factor in reaching a conclusion from this experiment. Either the test subjects are particularly bad at distinguishing humans from computers, the subjects were expecting human-like conversation (which would probably bias the results), or the human was intentionally trying to act like a computer.
- Re: You Can Fool Some of the People Some of the Ti (Score:2)
  
  by Fons_de_spons ( 1311177 ) writes:
  
  Personally, if I'd want my AI to pass the Turing test, I'd design it to be clumsy and kind of stupid with little factual knowledge except for weather info. Gpt tries too hard.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    IN the regular Turing Test competitions, the machines doing best simulate kids with a mental disability.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by gtall ( 79522 ) writes:
  
  Nah, they merely compared the results with humans who are also prone to making shit up and figured it was close to human intelligence.
- Re: You Can Fool Some of the People Some of the Ti (Score:2)
  
  by dbialac ( 320955 ) writes:
  
  There's really an easy way to show that it doesn't pass the Turing test: converse on some very unrelated topics. If it comes off as having too much knowledge of way too many topics, it's a computer.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Yep, same here. Also the human participants play a huge role. This says more about the limitation of the average human.
- Re: (Score:2)
  
  by Big Hairy Gorilla ( 9839972 ) writes:
  
  I'll just jump in and push a few things off the table, without prejudice.
  
  TRUMP COULDN'T PASS A TURING TEST.
  
  What does that say?
  - Re: (Score:2)
    
    by Big Hairy Gorilla ( 9839972 ) writes:
    
    What Trump said the other day wouldn't pass the Turing test.
    
    QUOTE
    I said, “So there’s a shark ten yards away from the boat, ten yards or here. Do I get electrocuted? If the boat is sinking, water goes over the battery, the boat is sinking. Do I stay on top of the boat and get electrocuted? Or do I jump over by the shark and not get electrocuted?”.
    
    Because I will tell you, he didn’t know the answer. He said, “You know, nobody’s ever asked me that ques
    - - Re: (Score:2)
        
        by Big Hairy Gorilla ( 9839972 ) writes:
        
        Answer the question.
        
        I'll dumb it down for you.
        
        What does it say about Trump that he couldn't pass a Turing test?
        and the corollary
        What does it say about Trump voters?
- Re: (Score:2)
  
  by WaffleMonster ( 969671 ) writes:
  
  I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve.
  Humans only scored 67%
Nah... (Score:5, Insightful)

by Turkinolith ( 7180598 ) writes: on Friday June 14, 2024 @10:18PM (#64550613)

I don't think a 54% really passes, you know? Like... a 54% is above halfway, but still an "F". Come back when you get at least a 60.

Share
twitter facebook
- Re: (Score:2)
  
  by Carewolf ( 581105 ) writes:
  
  It would only be a pass, if humans scored equivalently, the interrogators were all forefront AI experts, and they had as much time as they wanted.
The human participant scored 67% (Score:5, Funny)

by skegg ( 666571 ) writes: on Friday June 14, 2024 @10:19PM (#64550619)

Feels like some dates I've been on ...

Share
twitter facebook
- Re: The human participant scored 67% (Score:5, Funny)
  
  by RightwingNutjob ( 1302813 ) writes: on Friday June 14, 2024 @11:06PM (#64550683)
  
  Maybe you needed to inflate her a little more?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by sacrilicious ( 316896 ) writes:
    
    Is that what they mean by "grade inflation"?
- Re: (Score:1)
  
  by sound+vision ( 884283 ) writes:
  
  Some... if that's your average, you're doing well.
- Re: (Score:2)
  
  by iAmWaySmarterThanYou ( 10095012 ) writes:
  
  2/3rds of your dates seemed human? Kudos, that's a good number.
  I had to filter out 80% at the pre-date stage because so many were nut heads and crazy af.
  The best one that ended before a first date, "I can't continue with you because my ex-boyfriend says you're not the right guy for me". Lol
Got a min? (Score:4, Insightful)

by markdavis ( 642305 ) writes: on Friday June 14, 2024 @10:41PM (#64550641)

>"The conversations lasted five minutes"
I believe that is too short to really explore enough to make a meaningful guess, especially for typed interaction. And if it only fooled 54% of people, that seems even less impressive. Of course, the ACTUAL human was only believed to be human by 67%, so, again, I think it was not enough time. Imagine what a difference between 10 seconds, 1 minute, 5 minutes, 10 minutes, 20 minutes, 40 minutes would make in your judgement.

Share
twitter facebook
- Re: (Score:3)
  
  by Austerity Empowers ( 669817 ) writes:
  
  I dunno, most conversations I've had lately with actual human beings haven't been very convincing. Turing may not have forseen the depressing direction of humanity. I watched a guy exit an interstate at 75mph cross three lanes of traffic to the right on the access road today, his left turn signal was on the whole time.
  The bar is very low.
  - Re: Got a min? (Score:2)
    
    by RightwingNutjob ( 1302813 ) writes:
    
    How do you know he wasn't trying to beat a tail? Maybe you witnessed brilliance without even realizing it.
    - Re: (Score:2)
      
      by Austerity Empowers ( 669817 ) writes:
      
      He was definitely beating something.
    - Re: (Score:2)
      
      by aRTeeNLCH ( 6256058 ) writes:
      
      Right, what I was thinking, the self driving tail zoomed straight on on the left lane. It was the left indicator that did the job!
no one cares (Score:1)

by Anonymous Coward writes:

Alan Turing gets credit for doing cool shit before people realized, but the turing test is non-sense. No one that actually does ANN research actually cares about the turing test.
this is just a publicity stunt. nothing to see here.
- Re: (Score:1)
  
  by blue trane ( 110704 ) writes:
  
  Isn't the appoint of the "Attention is all you need" paper that you don't even need ANNs anymore?
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    Sure. Except that a transformer is arguably an ANN itself, and 2/3 of a typical "transformer" model is actually straight up perceptrons out of the 50s.
    I saw the title "Attention is about 1/3 of what you need" somewhere. Much more accurate.
- Re: (Score:2)
  
  by penguinoid ( 724646 ) writes:
  
  The Touring Test is great, it will let you know whether you've created a human-level AI. That's why the people who make shitty AIs always cripple the test.
- Re: (Score:2)
  
  by Dragonslicer ( 991472 ) writes:
  
  Most researchers don't care about the Turing Test because they don't care about developing an AI that can generally pass for human, they only care about more specific objectives. That doesn't make the Turing Test nonsense, it just isn't applicable to their research.
  - Re: (Score:2)
    
    by HiThere ( 15173 ) writes:
    
    OTOH, the real problem with "The Turing Test" is that it's NEVER properly implemented. The only reason it's ever done is as a PR stunt. Of course, passing the Turing test isn't that useful. It was intended as a argument, not as something to be actually striven for.
The bar is set too low (Score:2)

by nehumanuscrede ( 624750 ) writes:

If the turing test is meant to gauge if an artificial intelligence can pass as human, then we've definitely set the bar too low.
If human beings are supposed to be the gold standard of intelligence on this planet, then I think the Universe also set the bar too low :|
Yes, but can it enjoy strawberries and cream? (Score:2)

by RightwingNutjob ( 1302813 ) writes:

The way a human being can?
- Re: (Score:2)
  
  by pacinpm ( 631330 ) writes:
  
  The way a human being can?
  And how do you know I enjoy strawberries the same way you do? Because I told you so?
There are two ways to pass a Turing test (Score:2)

by taustin ( 171655 ) writes:

The first is to build a computer that can carry on a conversation like a human.
The second is to have it carry on a conversation with a human that's dumber than a box of rocks.
One must always be careful to distinguish between correlation and casuation.
Original Turing Test (Score:1)

by blue trane ( 110704 ) writes:

"I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning."
A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433-460.
Humans; now with 33% less human! (Score:1)

by geekmux ( 1040042 ) writes:

ELIZA..was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%.
The results of a Turing test today might just say far more about the decline in average intelligence since the 1950s than what the test seeks to validate. You can fool people rather easily these days (as evidenced by the 22% of humans who thought a 1960s chatbot was human). Machines will get smarter as a whole. Humanity? Not so much.
Also, the actual human graded out at only 67% meatsack? Either the is-it-a-human testing standards are miscalibrated, or the “human” tested was made by Cyberdyne
- Re: (Score:2)
  
  by dinfinity ( 2300094 ) writes:
  
  Also, the actual human graded out at only 67% meatsack?
  If you look at Figure 1 in the paper, at least 1 human apparently got cute and tried to be deliberately misleading. The interviewees were (footnote 2):
  A: GPT-4
  B: Human
  C: GPT-3.5
  D: ELIZA
  I don't know why they chose to include this bit of conversation as the human example, but if a lot of the actual humans interviewed acted like this I can definitely see people going "AI" for them.
  - Re: (Score:3)
    
    by geekmux ( 1040042 ) writes:
    
    Also, the actual human graded out at only 67% meatsack?
    If you look at Figure 1 in the paper, at least 1 human apparently got cute and tried to be deliberately misleading. The interviewees were (footnote 2): A: GPT-4 B: Human C: GPT-3.5 D: ELIZA
    I don't know why they chose to include this bit of conversation as the human example, but if a lot of the actual humans interviewed acted like this I can definitely see people going "AI" for them.
    Perhaps the humans involved in test measurement forgot for a moment how easy it is to trigger the native human sarcasm gland when the environment is right.
    (Human, channeling Ace Ventura after reading their instructions) ”Oh, so you want me to ‘act human’ here? Allllrighty then..”
    Its ironic that in order for the machine to understand how smart we want it to be, it first has to accurately deduce exactly how stupid we can be at any given moment. It’s going to be downright scary
- Re: (Score:2)
  
  by ArchieBunker ( 132337 ) writes:
  
  I barely trust humans to handle a butter knife without injuring themselves.
Sure, Turing (Score:2)

by JamesTRexx ( 675890 ) writes:

But how does it do on the Voight-Kampf test?
The goal of AI isn't to appear fully human (Score:5, Insightful)

by ET3D ( 1169851 ) writes: on Saturday June 15, 2024 @03:00AM (#64550897)

ChatGPT isn't designed to mimic humans. It might succeed better with some prompting, but might also require specific training. It'd have to be prompted to go on tangent, hold illogical beliefs and get insulted if you don't agree with them, and other human conversation traits. Basically, it should be trained as AS (artificial stupidity) rather than AI to feel much closer to human.
I imagine that some future games will have AI which is good at this.

Share
twitter facebook
Unfit for purpse (Score:2)

by GeekWithAKnife ( 2717871 ) writes:

That's what the test has finally proven. In it's current format - that it's unfit for purpose.
- Re: (Score:2)
  
  by GrahamJ ( 241784 ) writes:
  
  Tricking humans isn't their purpose.
Doubt the results will stand up to hard scrutiny. (Score:2)

by Eunomion ( 8640039 ) writes:

But that's usually how things progress: First outright fraud, then gimmicky tricks that evade rather than pass the test, and now we're in a phase of ambiguity. Next step is increasing clarity.
That is not how ELIZA works (Score:5, Informative)

by angel'o'sphere ( 80593 ) writes: <angelo.schneider ... e ['nto' in gap]> on Saturday June 15, 2024 @04:22AM (#64551007) Journal

ELIZA, a system pre-programmed with responses but with no large language model
That is not correct.
ELIZA is a clever programmed linguistic hack. (Perhaps less than 500 lines of C, in a single file)
When it is "linguistically speaking" out of any clue, it uses preprogrammed answers.
Preprogrammed e.g. would be:
- Tell me more about {it} - where it might be replaced by something the Eliza thinks is the topic. But can be a plain sentence.
- That is interesting!
- Why are you thinking like that?
However in general, the ELIZA will transform statements into questions. Using simple text replacement rules.
So you write: "My job is so boring". She does not know what job and boring means. But she will transform My into Your and make a sentence like: Why is your job (so) boring?
Then you give three answers which she is ignoring for now.
And she answers: tell me more about it.
After you gave more answers, she will go back and transform some of your input again with simple linguistics based text replacement rules into a new question.'
Sometimes she picks two sentences and says something like: what is worse, having a boring job or a stupid boss?
And so on. There not really much preprogrammed in answers, and there is no real attempt to have any AI in it at all.
Perhaps you even find a fully fledged out Eliza on wikipedia.

Share
twitter facebook
I would not call that passing. (Score:2)

by jd ( 1658 ) writes:

I would not even say that was legitimately the Turing Test.
Would all humans pass the Turing test? (Score:3)

by tinkerton ( 199273 ) writes: on Saturday June 15, 2024 @06:40AM (#64551083)

Just asking.

Share
twitter facebook
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  In this example, only around 67% of them.
  Of course, this wasn't the real Turing test as originally proposed. That would be more difficult. But I'm not sure the AI wouldn't do better than the average human. (Well, I'm restricting "average human" to a native speaker of whatever language the test is conducted in. One of the questions Turing used as an example was asking the respondent to compose a poem. I forget whether he specified the style...but most people *could* manage a Limerick, if they thought of
  - Re: (Score:2)
    
    by tinkerton ( 199273 ) writes:
    
    That's quite a lot of false negatives. I suppose placing the bar higher doesn't make the test useless .
    Then there's reverse Turing, convince the jury you're merely software.
Old News (Score:2)

by allo ( 1728082 ) writes:

Like 2 years ago people stopped using the turing test as LLM started to fool humans. The new variant to benchmark AI are leaderboards on which people compare different answers and choose which AI does better.
Pointless Test (Score:2)

by Currently_Defacating ( 10122078 ) writes:

People are so dumb now a bowl of alphabet soup could pass the Turing Test.
Computing machinery and Intelligence (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Computing machinery and Intelligence [ox.ac.uk] by A. M. Turing

‘I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think."

The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous, If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the ans
Turing test is trivial (Score:2)

by groobly ( 6155920 ) writes:

The Turing test is trivial compared to human vision.
Turing test has always been relative (Score:2)

by Tony Isaac ( 1301187 ) writes:

Turing tests have always been based on questions that the technology of the day wouldn't be able to answer, but a human theoretically could. The tests never *really* were able to distinguish between a computer and a human, they were just able to distinguish (to a degree) computers of a certain time period, from humans. The fact that technology can now pass these tests, is just a sign of the improvement in technology. It doesn't mean anything beyond that.
Time to flip the script? (Score:2)

by Tony Isaac ( 1301187 ) writes:

So far, Turing tests have focused on questions that a computer can't answer, but a human can.
Maybe it's time to come up with questions that a computer *can* answer, but a human couldn't possibly be able to answer.
The question of un-alive (Score:2)

by kyoko21 ( 198413 ) writes:

I wonder how it responds to un-aliving itself.
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  The Chinese Room argument is fallacious. It posits impossible conditions.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Teaching to the Test (Score:5, Interesting)

Came to post the same thing (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Teaching to the Test (Score:2)

Re: (Score:2)

Re: Teaching to the Test (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Teaching to the Test (Score:2)

Dementia (Score:2)

You Can Fool Some of the People Some of the Time (Score:5, Insightful)

Yikes (Score:5, Insightful)

Not significant [Re:You Can Fool Some of the P...] (Score:5, Informative)

Re: Not significant [Re:You Can Fool Some of the P (Score:2)

Re: (Score:2)

Re:You Can Fool Some of the People Some of the Tim (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: You Can Fool Some of the People Some of the Ti (Score:2)

Re: (Score:2)

Re: You Can Fool Some of the People Some of the (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: You Can Fool Some of the People Some of the Ti (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: You Can Fool Some of the People Some of the Ti (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Nah... (Score:5, Insightful)

Re: (Score:2)

The human participant scored 67% (Score:5, Funny)

Re: The human participant scored 67% (Score:5, Funny)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Got a min? (Score:4, Insightful)

Re: (Score:3)

Re: Got a min? (Score:2)

Re: (Score:2)

Re: (Score:2)

no one cares (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The bar is set too low (Score:2)

Yes, but can it enjoy strawberries and cream? (Score:2)

Re: (Score:2)

There are two ways to pass a Turing test (Score:2)

Original Turing Test (Score:1)

Humans; now with 33% less human! (Score:1)

Re: (Score:2)