Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI

GPT-4 Has Passed the Turing Test, Researchers Claim 121

Drew Turney reports via Live Science: The "Turing test," first proposed as "the imitation game" by computer scientist Alan Turing in 1950, judges whether a machine's ability to show intelligence is indistinguishable from a human. For a machine to pass the Turing test, it must be able to talk to somebody and fool them into thinking it is human. Scientists decided to replicate this test by asking 500 people to speak with four respondents, including a human and the 1960s-era AI program ELIZA as well as both GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes -- after which participants had to say whether they believed they were talking to a human or an AI. In the study, published May 9 to the pre-print arXiv server, the scientists found that participants judged GPT-4 to be human 54% of the time.

ELIZA, a system pre-programmed with responses but with no large language model (LLM) or neural network architecture, was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%. "Machines can confabulate, mashing together plausible ex-post-facto justifications for things, as humans do," Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science. "They can be subject to cognitive biases, bamboozled and manipulated, and are becoming increasingly deceptive. All these elements mean human-like foibles and quirks are being expressed in AI systems, which makes them more human-like than previous approaches that had little more than a list of canned responses."
Further reading: 1960s Chatbot ELIZA Beat OpenAI's GPT-3.5 In a Recent Turing Test Study

GPT-4 Has Passed the Turing Test, Researchers Claim

Comments Filter:
  • Teaching to the Test (Score:5, Interesting)

    by sound+vision ( 884283 ) on Friday June 14, 2024 @10:05PM (#64550593) Journal

    LLMs - The ultimate example of teaching to the test?

    • by SuperKendall ( 25149 ) on Friday June 14, 2024 @10:10PM (#64550595)

      Funny that the first and second posts are about the same exact thing - that if you were to construct from scratch a system with only one goal, to pass the turning tests, LLMs are how you would do it.

      They are built to output streams of words that are statistically the most likely word based on what other humans write. So in response to almost anything, an LLM will produce a string of words that almost always look and read very human on the surface.

      Not to mention, that just like real humans an LLM is capable not just of complete fabrication, but also doubling down on the fabrication if needed by providing more details of something completely made up!

      So it's no wonder an LLM was able to pass the Turning Test. But the question of more general ability to help is I feel kind of still up in the air apart from targeted tasks.

    • by narcc ( 412956 )

      Eliza passed the Turing test. It's not that interesting. It doesn't tell us anything about the machine. All it does is highlight how ridiculous people can be.

      "Teaching to the test" is the whole game. It's not about making an intelligent machine, it's about making credulous humans think the machine is intelligent.

      • Re: (Score:3, Interesting)

        I guess it depends on the IQ of the person. It is pretty quickly clear that an Eliza is not AI and certainly not a human.
        Or at least some meta knowledge as in: oki, they want to test me if I can figure the other one is an AI ... how do I trick an AI to reveal itself?

        On the other hand, I once put an Eliza into an IRC channel. It would drag people into a conversation if they mentioned its name, Pirx. As I made obviously a new ElizaObject for each conversation partner of Pirx, he never mixed anything up.

        It wen

      • A human judging ELIZA to be a fellow human is not so much a case of ELIZA passing the Turing test, but of the human failing it.
        • A human judging ELIZA to be a fellow human is not so much a case of ELIZA passing the Turing test, but of the human failing it.

          It's very easily said because we all know Eliza's tricks. However, if you aren't trying to trick it then it's very possible to have a "lucky" Eliza session which just perfectly matches what you want to talk about with what Eliza expects. You can go for quite a few minutes before you get something obviously weird. A time limited Turing test could easily pass.

      • I feel like a lot of passed turing tests lean pretty heavily on the novelty of the experience.
        When ELIZA came out terminals were still luxurious, adventure wasn't even a thing, so if you types something into a computer and got anything even slightly resembling a human response; It's easy to see how a person could decide that's not how computers act so it must be a person.

        ChatGPT seems very human when you first interact with it but once you know how modern LLMs behave you're much less likely to be fooled.

        • The other day i wanted to replace \n with ', ' at the end of every line of a file in Linux. It would only tell me about tr command because someone somewhere recommended it instead for a different problem. Of course that was useless for me because tr can only do characters 1:1.
        • by narcc ( 412956 )

          ChatGPT seems very human when you first interact with it but once you know how modern LLMs behave you're much less likely to be fooled.

          Exactly. As Joe Weizenbaum hopefully states in his 1966 ELIZA: A Computer Program For the Study of Natural Language Communication Between Man And Machine

          "For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away..."

          Though that might be overly optimistic. Some people, it seems, want to be fooled. Even some very smart and well-credentialed people working in the field.

        • I once showed someone a text based online game in the school library ... and they had trouble believing there could be a person at the other end. That was in the early days of the (public perception of the) internet, of course :)

      • It's not about making an intelligent machine, it's about making credulous humans think the machine is intelligent.

        ChatGPT can chat intelligently with me about building a moderately complex PHP application ... more intelligently than some coworkers I have had.

        It can also usually just build it ... given good requirements and feedback. Again, better than some coworkers I've had ...

        I don't know if that's "intelligent", in a philosophical sense, but it's pretty freakin' amazing, unless I move goalposts in massive fashion.

        • by narcc ( 412956 )

          ChatGPT can chat intelligently with me about building a moderately complex PHP application

          No, it can't. You're deluding yourself. This is nothing new. From Weizenbaum's 1966 ELIZA: A Computer Program for the Study of Natural Language Communication Between Man and Machine

          Some subjects have been very hard to convince that ELIZA is not human. This is a striking form of Turing's test

          My guess is that you really want the machine to be doing far more than it's doing, or to be capable of more than it is capable of doing. I'm not sure what that fantasy is, but I can say with absolute certainty that it is just a fantasy.

          It can also usually just build it ... given good requirements and feedback.

          Pay very close attention to the second part of that. You are the one building it, just in

          • Shrug

            I've been a programmer over 20 years ... you can try to convince me that a useful tool that I'm using isn't useful, but I doubt you''ll succeed, lol.

            You seem to have skipped (or failed to grok, anyway) the key bit, "I don't know if that's "intelligent", in a philosophical sense, but it's pretty freakin' amazing,"

            It is pretty amazing, and the goalpost moving attempts to have it not be amazing are ... unconvincing.

            You are the one building it, just in a needlessly round-about away.

            Of course I am ... with a tool. And no, actually, "taking way longer to do it without a

            • by narcc ( 412956 )

              you can try to convince me that a useful tool that I'm using isn't useful, but I doubt you''ll succeed

              All I can do is toss you a rope. It's up to you to pull yourself out.

              You seem to have skipped (or failed to grok, anyway) the key bit,

              You should learn to read. I even quote directly from that line.

              And no, actually, "taking way longer to do it without a tool" is what would be the needlessly roundabout way.

              Sign... You're not using a tool, you're playing with a toy. Again, the only reason you only feel more productive is because you're more focused on your work because you've found a way to incorporate your new toy. You'll find yourself using your toy less and less as the novelty wears off and fighting with it stops being fun.

              I've seen this same silly scenario play out many ti

      • No, no AI has passed the Turing test or been even close to it, not even this most recent on.

        The goal is not to fool one human (Eliza), or fool 50% of a group of idiots after 5 minutes. The test is to fool the best AI experts in the world over a long time.

        • by narcc ( 412956 )

          You're just making up whatever silly rules you want. Eliza easily passed the Turning test back in the 1960's. That was the whole point.

          Some subjects have been very hard to convince that ELIZA is not human. This is a striking form of Turing's test

          Weizenbaum in ELIZA: A Computer Program For the Study of Natural Language Communication Between Man And Machine, 1966

          I was startled to see how quickly and how very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it

          ELIZA created the most remarkable illusion of having understood the minds of the many people who conversed with it

          Weizenbaum in Computer Power and Human Reason, 1976

          His secretary famously wanted her sessions with the machine confidential, convinced that the machine understood and empathized with her.

          Once my secretary, who had watched me work on the program for many months and therefore surely knew it to be merely a computer program, started conversing with it. After only a few interchanges with it, she asked me to leave the room.

          What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.

          ibid

          Eliza is still fooling people today. It's remarkable how such a s

          • The Turing test is duck typing of AI taken to its exteme.

            The logic is: If noone can tell something from human intelligence, it _is_ in this setting human intelligence.

            Note the NO ONE in the sentence, that doens't mean some idiot, it means everybody, but if we are trying to be efficient about it, fooling the best people in the world at telling if something is AI.

    • That speaks volumes about what you think of humans. Why can't we just make up some kind of test that would reveal it's an AI? I am sure we can.
      • Do you really think you can? Talk with thousands of different humans from all ages, race, education, etc. and you will see a lot of people will probably fail that test too and be judged to be AI. Any new AI will be better as its predecessor and will be able to 'mimic' human behaviour, I expect GPT-6 or 7 even to be at such a level that it will be smarter as any human (with only a very few exceptions)
        • by HiThere ( 15173 )

          Define "smarter". There ARE reasonable definitions that would validate your assertion, but most of them wouldn't. LLMs don't even understand that the external world exists, much less understand it. (This *is* an artifact of their training, but they're trained the way they are because there's this huge source of "data" available for free.)

          Properly structured the AI would have smaller modules handling specific data inputs, and other modules interconnecting those modules. And it would also need to experime

          • Actually proper AI based on neural networks learns exactly the same way as humans do, only much faster (depending on the hardware behind it). So proper AI certainly can understand the external world depending on the sensors it has access to. But AI like GPT-4 are still shortcuts as the needed hardware for proper AI is still not small/advanced enough to make it 'smart' enough on its own. But in the next 10 years it will be, thanx to current AI for extra advancements. Too many people still think that AI is no
    • An AI or a human can be easily spotted by using trigger words or employing general craziness. Or inserting words in another language and playing with the words.

      The problem with the result and the test in general is that people are not testing the system, but expect to have a believable conversation.

      So, there is a preset desire to be fooled.

    • At this time, ChatGPT may be a good companion for a lonely senior with dementia. Eventually it will be much better.
  • by crunchy_one ( 1047426 ) on Friday June 14, 2024 @10:14PM (#64550603)
    I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.
    • Yikes (Score:5, Insightful)

      by SuperKendall ( 25149 ) on Friday June 14, 2024 @10:18PM (#64550617)

      I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result.

      Just like AI itself, this story is overhyped and underperforms in the real world.

    • by XXongo ( 3986865 ) on Friday June 14, 2024 @10:31PM (#64550633) Homepage

      I'd hardly say that a 54% score is passing

      Agree. If the 500 people just said "human" or "AI" at random, the number of people who would guess it was human is 50%... plus or minus 4.5%.

      Not significant.

      • Humans are supposedly bad at choosing random numbers on a scale of 1 to 10 or 100, but are we really any better at making random binary choices?

        I'm actually curious what would be the numbers if you got 500 people and just asked them if they thought they were going to talk to a human or a machine *before* the chat session, what the breakdown would be.

      • Your comment only makes sense if we were testing the interrogators, like when people do a multiple choice exam and any average score of 1/(number of possible answers) is equal to random guessing.
        But we're not, we're testing the stuff they need to provide answers for (the quality of the exam in the analogy). The only sensible comparison then is between the baseline (let's say a tried and tested set of exam questions) and the novel thing were interested in (a new exam written by an unexperienced teacher). The

    • by quantaman ( 517394 ) on Friday June 14, 2024 @10:37PM (#64550639)

      I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.

      Considering humans were only 67% the 54% doesn't bother me so much.

      But the 5-minute threshold is really low, if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages. I suspect the AIs would break down in longer conversations.

      The other thing I noticed is that the AIs relied on a few tricks, I suspect if the interrogators had some practise the AIs wouldn't do so well [youtu.be].

      • if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages

        This is incorrect. The screenshots are just excerpts and certainly do not reflect 5 minutes of conversation.

        More noteworthy is that ChatGPT was explicitly instructed to play a little dumb. This of course lowers expectations of the people interacting with the AI. The full instruction prompt (Appendix A):

        You are about to play a Turing Test game as part of an experiment
        you are taking part in as a worker on Prolific. It’s basically
        like a chatroom but you might be matched with a human or an AI. It
        seems lik

        • by ceoyoyo ( 59147 )

          I like how half of the instruction prompt is basically "don't write too good."

        • I'll bet you could get a LOT of hits with that as a description of a person in a dating app.

          Maybe it's time to retire the Turing Test, and introduce the Tinder Test.

          Jokes aside I think I remember someone that was using chatGPT to screen women on some dating apps!

          • Yep, apparently he fooled his now wife for months with it: https://gizmodo.com/guy-used-c... [gizmodo.com]

            I know Slashdot likes to shit on everything AI nowadays, but I think one of the main things we need to keep in mind is that most people aren't particularly .. impressive. We've shed a lot of blood, sweat, and tears to come up with the systems that allow us to cooperate and do great things as a collective in spite of all our incompetence and many, many individual flaws. And even then it's still an uphill battle in man

        • if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages

          This is incorrect. The screenshots are just excerpts and certainly do not reflect 5 minutes of conversation.

          In the paper they describe them as "Figure 1: A selection of conversations between human interrogators (green) and witnesses (grey).
          One of these four conversations is with a human witness, the rest are with AI. Interrogator verdicts
          and ground truth identities are below (to allow readers to indirectly participate).

          They definitely imply those are the whole conversations and they don't show anything more extensive elsewhere, and honestly, the numbers work out.

          Assume 30 seconds for the interrogator to think of

          • They definitely imply those are the whole conversations and they don't show anything more extensive elsewhere, and honestly, the numbers work out.

            If you look at figure 5, there is an "in-progress conversation" with 4 messages back and forth, with 2:43 visible: "the timer at the top shows time remaining in the game".

            So it's more like 133s/8 messages ~= 17s/message, which would work out to about 10 messages per participant per conversation of 5 minutes.
            That is more than 5, but I do agree that that is not a lot of conversation.

            • I saw that bit as well, I assumed that was just a particularly fast fingered interrogator. Really, they should have clearer on that (assuming it wasn't explicitly mentioned in the text somewhere).

      • Curious why they wouldn't ask it presumptive shit like which part of Washington it grew up in, where it graduated, how old it was when its mommy and daddy got divorced. I'm not an expert on LLMs but something tells me they'd fuck that up pretty bad. The reason I think that is several fold:

        - LLMs rely on the prompts themselves to present the illusion of being intelligent. Try to, for example, picture what an AI chatbot might say with no prompt at all. Basically all it could do is either do something preprogr

      • But the 5-minute threshold is really low, if you look at the paper the "conversations" are the equivalent of a series of 5 back and forth text messages. I suspect the AIs would break down in longer conversations.

        "It took over 100 questions for Rachel, didn't it???"

      • You are correct.

        I've only watched one ChatGPT long-form "debate" (this one: https://www.youtube.com/live/O... [youtube.com] )

        but it's clear to probably anybody reading a transcript that it's machine intelligence.

        The five-minute limit is obviously there to skew but perhaps that's enough to learn why my phone refill didn't get credited?

        PS your link had the si= tracking link, if you need this account to remain pseudonymous from Google and the people with access to its data (which is linked now in their TIA).

    • I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve. The 5-minute test time smacks of tuning the test to get closer to the desired result. Overall, I'm unimpressed.

      If 54% is a failing grade, then 67% is almost a failing grade. If the 54% represents unimpressive failure, does the 67% mean that real humans are on the boundary of unimpressive failure?

      I see the close to 50% number as impressive. It means that humans are struggling to distinguish the machine from the human. A 100% doesn't indicate identification of a human subject but rather exposure of a rigged test.

      • If 54% is a failing grade, then 67% is almost a failing grade. If the 54% represents unimpressive failure, does the 67% mean that real humans are on the boundary of unimpressive failure?

        This seems like it should be an important factor in reaching a conclusion from this experiment. Either the test subjects are particularly bad at distinguishing humans from computers, the subjects were expecting human-like conversation (which would probably bias the results), or the human was intentionally trying to act like a computer.

    • Personally, if I'd want my AI to pass the Turing test, I'd design it to be clumsy and kind of stupid with little factual knowledge except for weather info. Gpt tries too hard.
      • by gweihir ( 88907 )

        IN the regular Turing Test competitions, the machines doing best simulate kids with a mental disability.

    • I found the paper, which neither TFS or TFA reference or link to, here: https://arxiv.org/abs/2301.100... [arxiv.org]

      There's no data, code, or other resources, I couldn't find any mention of error margins, & the distribution graph at the end looks pretty "chunky," which I interpret as a certain degree of randomness in correctly identifying bots vs humans.

      Either in practice, the Turing test is fundamentally flawed because of participant error/inconsistency (called reliability in assessment, i.e. the same cand
    • by gtall ( 79522 )

      Nah, they merely compared the results with humans who are also prone to making shit up and figured it was close to human intelligence.

    • There's really an easy way to show that it doesn't pass the Turing test: converse on some very unrelated topics. If it comes off as having too much knowledge of way too many topics, it's a computer.
    • by gweihir ( 88907 )

      Yep, same here. Also the human participants play a huge role. This says more about the limitation of the average human.

    • I'll just jump in and push a few things off the table, without prejudice.

      TRUMP COULDN'T PASS A TURING TEST.

      What does that say?
      • What Trump said the other day wouldn't pass the Turing test.

        QUOTE
        I said, “So there’s a shark ten yards away from the boat, ten yards or here. Do I get electrocuted? If the boat is sinking, water goes over the battery, the boat is sinking. Do I stay on top of the boat and get electrocuted? Or do I jump over by the shark and not get electrocuted?”.

        Because I will tell you, he didn’t know the answer. He said, “You know, nobody’s ever asked me that ques
    • I'd hardly say that a 54% score is passing, but maybe the researchers were grading on a curve.

      Humans only scored 67%

  • Nah... (Score:5, Insightful)

    by Turkinolith ( 7180598 ) on Friday June 14, 2024 @10:18PM (#64550613)
    I don't think a 54% really passes, you know? Like... a 54% is above halfway, but still an "F". Come back when you get at least a 60.
    • It would only be a pass, if humans scored equivalently, the interrogators were all forefront AI experts, and they had as much time as they wanted.

  • by skegg ( 666571 ) on Friday June 14, 2024 @10:19PM (#64550619)

    Feels like some dates I've been on ...

  • Got a min? (Score:4, Insightful)

    by markdavis ( 642305 ) on Friday June 14, 2024 @10:41PM (#64550641)

    >"The conversations lasted five minutes"

    I believe that is too short to really explore enough to make a meaningful guess, especially for typed interaction. And if it only fooled 54% of people, that seems even less impressive. Of course, the ACTUAL human was only believed to be human by 67%, so, again, I think it was not enough time. Imagine what a difference between 10 seconds, 1 minute, 5 minutes, 10 minutes, 20 minutes, 40 minutes would make in your judgement.

    • I dunno, most conversations I've had lately with actual human beings haven't been very convincing. Turing may not have forseen the depressing direction of humanity. I watched a guy exit an interstate at 75mph cross three lanes of traffic to the right on the access road today, his left turn signal was on the whole time.

      The bar is very low.

  • by Anonymous Coward
    Alan Turing gets credit for doing cool shit before people realized, but the turing test is non-sense. No one that actually does ANN research actually cares about the turing test.

    this is just a publicity stunt. nothing to see here.

    • Isn't the appoint of the "Attention is all you need" paper that you don't even need ANNs anymore?

      • by ceoyoyo ( 59147 )

        Sure. Except that a transformer is arguably an ANN itself, and 2/3 of a typical "transformer" model is actually straight up perceptrons out of the 50s.

        I saw the title "Attention is about 1/3 of what you need" somewhere. Much more accurate.

    • The Touring Test is great, it will let you know whether you've created a human-level AI. That's why the people who make shitty AIs always cripple the test.

    • Most researchers don't care about the Turing Test because they don't care about developing an AI that can generally pass for human, they only care about more specific objectives. That doesn't make the Turing Test nonsense, it just isn't applicable to their research.
      • by HiThere ( 15173 )

        OTOH, the real problem with "The Turing Test" is that it's NEVER properly implemented. The only reason it's ever done is as a PR stunt. Of course, passing the Turing test isn't that useful. It was intended as a argument, not as something to be actually striven for.

  • If the turing test is meant to gauge if an artificial intelligence can pass as human, then we've definitely set the bar too low.

    If human beings are supposed to be the gold standard of intelligence on this planet, then I think the Universe also set the bar too low :|

    • by pacinpm ( 631330 )

      The way a human being can?

      And how do you know I enjoy strawberries the same way you do? Because I told you so?

  • The first is to build a computer that can carry on a conversation like a human.

    The second is to have it carry on a conversation with a human that's dumber than a box of rocks.

    One must always be careful to distinguish between correlation and casuation.

  • "I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning."

    A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433-460.

  • ELIZA..was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%.

    The results of a Turing test today might just say far more about the decline in average intelligence since the 1950s than what the test seeks to validate. You can fool people rather easily these days (as evidenced by the 22% of humans who thought a 1960s chatbot was human). Machines will get smarter as a whole. Humanity? Not so much.

    Also, the actual human graded out at only 67% meatsack? Either the is-it-a-human testing standards are miscalibrated, or the “human” tested was made by Cyberdyne

    • Also, the actual human graded out at only 67% meatsack?

      If you look at Figure 1 in the paper, at least 1 human apparently got cute and tried to be deliberately misleading. The interviewees were (footnote 2):
      A: GPT-4
      B: Human
      C: GPT-3.5
      D: ELIZA

      I don't know why they chose to include this bit of conversation as the human example, but if a lot of the actual humans interviewed acted like this I can definitely see people going "AI" for them.

      • Also, the actual human graded out at only 67% meatsack?

        If you look at Figure 1 in the paper, at least 1 human apparently got cute and tried to be deliberately misleading. The interviewees were (footnote 2): A: GPT-4 B: Human C: GPT-3.5 D: ELIZA

        I don't know why they chose to include this bit of conversation as the human example, but if a lot of the actual humans interviewed acted like this I can definitely see people going "AI" for them.

        Perhaps the humans involved in test measurement forgot for a moment how easy it is to trigger the native human sarcasm gland when the environment is right.

        (Human, channeling Ace Ventura after reading their instructions) ”Oh, so you want me to ‘act human’ here? Allllrighty then..”

        Its ironic that in order for the machine to understand how smart we want it to be, it first has to accurately deduce exactly how stupid we can be at any given moment. It’s going to be downright scary

    • I barely trust humans to handle a butter knife without injuring themselves.

  • But how does it do on the Voight-Kampf test?

  • by ET3D ( 1169851 ) on Saturday June 15, 2024 @03:00AM (#64550897)

    ChatGPT isn't designed to mimic humans. It might succeed better with some prompting, but might also require specific training. It'd have to be prompted to go on tangent, hold illogical beliefs and get insulted if you don't agree with them, and other human conversation traits. Basically, it should be trained as AS (artificial stupidity) rather than AI to feel much closer to human.

    I imagine that some future games will have AI which is good at this.

  • That's what the test has finally proven. In it's current format - that it's unfit for purpose.
  • But that's usually how things progress: First outright fraud, then gimmicky tricks that evade rather than pass the test, and now we're in a phase of ambiguity. Next step is increasing clarity.
  • ELIZA, a system pre-programmed with responses but with no large language model
    That is not correct.

    ELIZA is a clever programmed linguistic hack. (Perhaps less than 500 lines of C, in a single file)

    When it is "linguistically speaking" out of any clue, it uses preprogrammed answers.
    Preprogrammed e.g. would be:
    - Tell me more about {it} - where it might be replaced by something the Eliza thinks is the topic. But can be a plain sentence.
    - That is interesting!
    - Why are you thinking like that?

    However in general, the ELIZA will transform statements into questions. Using simple text replacement rules.

    So you write: "My job is so boring". She does not know what job and boring means. But she will transform My into Your and make a sentence like: Why is your job (so) boring?
    Then you give three answers which she is ignoring for now.
    And she answers: tell me more about it.
    After you gave more answers, she will go back and transform some of your input again with simple linguistics based text replacement rules into a new question.'
    Sometimes she picks two sentences and says something like: what is worse, having a boring job or a stupid boss?

    And so on. There not really much preprogrammed in answers, and there is no real attempt to have any AI in it at all.

    Perhaps you even find a fully fledged out Eliza on wikipedia.

  • I would not even say that was legitimately the Turing Test.

    • by HiThere ( 15173 )

      In this example, only around 67% of them.

      Of course, this wasn't the real Turing test as originally proposed. That would be more difficult. But I'm not sure the AI wouldn't do better than the average human. (Well, I'm restricting "average human" to a native speaker of whatever language the test is conducted in. One of the questions Turing used as an example was asking the respondent to compose a poem. I forget whether he specified the style...but most people *could* manage a Limerick, if they thought of

      • That's quite a lot of false negatives. I suppose placing the bar higher doesn't make the test useless .
        Then there's reverse Turing, convince the jury you're merely software.

  • Like 2 years ago people stopped using the turing test as LLM started to fool humans. The new variant to benchmark AI are leaderboards on which people compare different answers and choose which AI does better.

  • People are so dumb now a bowl of alphabet soup could pass the Turing Test.
  • Most humans have been dumber than computers since the introduction of calculators.
  • Computing machinery and Intelligence [ox.ac.uk] by A. M. Turing

    ‘I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think."

    The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous, If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the ans
  • The Turing test is trivial compared to human vision.

  • Turing tests have always been based on questions that the technology of the day wouldn't be able to answer, but a human theoretically could. The tests never *really* were able to distinguish between a computer and a human, they were just able to distinguish (to a degree) computers of a certain time period, from humans. The fact that technology can now pass these tests, is just a sign of the improvement in technology. It doesn't mean anything beyond that.

  • So far, Turing tests have focused on questions that a computer can't answer, but a human can.

    Maybe it's time to come up with questions that a computer *can* answer, but a human couldn't possibly be able to answer.

  • I wonder how it responds to un-aliving itself.

C for yourself.

Working...