Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI China

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why 56

OpenAI's "reasoning" AI model, o1, has exhibited a puzzling behavior of "thinking" in Chinese, Persian, or some other language -- "even when asked a question in English," reports TechCrunch. While the exact cause remains unclear, as OpenAI has yet to provide an explanation, AI experts have proposed a few theories. From the report: Several on X, including Hugging Face CEO Clement Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of "Chinese linguistic influence on reasoning."

"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding," Xiao wrote in a post on X. "[F]or expert labor availability and cost reasons, many of these data providers are based in China." [...] Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Other experts don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution. Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating). "The model doesn't know what language is, or that languages are different," Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. "It's all just text to it."

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models' language inconsistencies may be explained by associations the models made during training. "By embracing every linguistic nuance, we expand the model's worldview and allow it to learn from the full spectrum of human knowledge," Wang wrote in a post on X. "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that's where I first learned and absorbed those ideas."

[...] Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "This type of observation on a deployed AI system is impossible to back up due to how opaque these models are," they told TechCrunch. "It's one of the many cases for why transparency in how AI systems are built is fundamental."

OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why

Comments Filter:
  • by fahrbot-bot ( 874524 ) on Tuesday January 14, 2025 @07:49PM (#65089399)

    OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, ...

    If it wants fly the MiG-31 Firefox [wikipedia.org], it'll have to "think in Russian."

    • Re:Ya, well ... (Score:5, Informative)

      by Kisai ( 213879 ) on Tuesday January 14, 2025 @08:17PM (#65089427)

      My guess is that it's likely doing it for a practical reason.

      Chinese has a lot of characters, so specific concepts are likely easier tokenized as single Chinese characters. Where as English is a hugely clunky language that requires 10 times as much verbosity to get the same concept.

      It likely requires less memory to work in Chinese for that reason.

      Heck every time I see "400 billion parameter" LLM I think "I'm sure that could be represented in 1/4th the size in Chinese" But then you have to add a translation layer for the ingress and egress of data which means the accuracy plummets if there isn't perfect 1:1 mapping of ingress language to chinese and back to an egress language.

      • by AmiMoJo ( 196126 )

        Was going to say the same thing. Sometimes I think in Japanese simply because the language and mindset are a better fit for the subject at hand.

        • Understanding Japanese is helpful when programming in Postscript or using an RPN calculator.

          Korean, Hungarian, and Basque also use SOV postfix grammar.

      • Re:Ya, well ... (Score:5, Interesting)

        by ceoyoyo ( 59147 ) on Tuesday January 14, 2025 @10:05PM (#65089611)

        Nobody outside of OpenAI really knows how they've implemented their reasoning system, and the article isn't clear about what "thinks in" means. My guess is that the former has the model produce a chain of intermediate results, in a language it knows, and the latter means some of those intermediate results are in different langauges.

        OpenAI also doesn't say how they train on multiple languages. I asked ChatGPT 4o mini "Pouvez-vous décrire un lion?" replacing various words with English it responded in French unless all the words of the question were English. It even responded in french when the query was "Can you describe un lion?" Asking "Can you describe un tree" also got a French response, with a little snark about my shitty French: 'Bien sûr ! Un "tree", ou arbre en français, est une structure....'

        Interestingly, if you ask it something where the non-English is a phrase that it might have seen in English ("What does je ne sais quoi mean?"), it responded once with split screen French and English and the rest of the time with English.

        So it seems to be able to understand mixed language input but the response is biased towards non-English, and the output is always in a single language except for quotes. That might mean the output is forced into whatever language some not very good detector thinks the input is in, and the intermediates sometimes drift a bit because the detector is kind of crap. That could well be because some particularly efficient tokens are in the other language, or it could be because the training data that contained some concept was in that language.

        I'm curious whether "thinks in X" means all of the intermediate was in X, if it was mixed languages, or if that's even possible.

        • Nobody outside of OpenAI really knows how they've implemented their reasoning system, and the article isn't clear about what "thinks in" means

          The o1 and o3 models use chain of thought to "think" about and reason step by step about the question you ask. Concretely, "thinking" is just emitting text about the "reasoning" process they're going through, before providing an answer. Whether or not it is "thinking" in any real sense, lots of empirical results have shown that having models emit chain of thought before responding improves their accuracy. One intuition for why, is that emitting more text lets the model spend more compute on answering a ques

    • by twosat ( 1414337 )

      Mitchell Gant (Clint Eastwood) must think in Russian to make his thought-controlled Firefox shoot down the other Firefox. https://www.youtube.com/watch?... [youtube.com]

  • I mean, I can understand Chinese, but Persian? That's peculiar to say the least.

  • "Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can't know for certain. "

    OK, don't listen to anything this guy says.

  • AIs don't "think" (Score:3, Informative)

    by dfghjk ( 711126 ) on Tuesday January 14, 2025 @08:20PM (#65089431)

    "OpenAI's "reasoning" AI model..."

    It's not a "reasoning model" because it doesn't "reason", that's just a provocative name given to suggests OpenAI knows more than it does. And AI's don't "think" either, so they don't think in Chinese. Nor is this behavior, when described properly, particularly "puzzling". It's interesting for sure, but clearly they train the model using multiple languages so you should expect learned information to potentially map to multiple languages. Brains exhibit this too.

    • Re: (Score:3, Insightful)

      The definition for "think" is vacuous, to say the least.
      It infers, and I challenge you to prove that your "thinking" is anything more than that, and it's been constructed so that it can combine inferences into logical steps to reason, a skill I think you're willfully throwing to the wayside because you're intimidated by how unspecial it makes you feel.

      Reasoning LLMs are an area of study. OpenAI didn't invent the term, nor are they anywhere close to the only people using it.
      Some of those people are Ph.D.
      • *are PH.D.s in universities
        • LLMs are just fancy autocompletes, which is useful for some things, for example I think you could do with a better one.

          • LLMs are transformers- they tokenize a string of words and fire them through hundreds of layers of millions of interconnected virtual neurons, collecting context along the way.
            What comes out on the other side is a result of the weights of all those neurons.

            To call that an "autocomplete" is too stupid to even be wrong.
            I'd love to see you prove that you're anything more than an autocomplete, and not even a very fancy one.
            • I read somewhere his mother is a _____
              Somebody completed, but we're not sure who.

            • by gweihir ( 88907 )

              That is just pseudo-profound bullshit, bereft of insight. LLMs _are_ essentially just autocomplete with some things wrapped around them for a nicer presentation. They cannot even do symbolic computation, they always have to stay on a concrete literal word level. You cannot get insight or understanding that way. You can fake in a way that less smart people are deeply impressed though.

              • LLMs _are_ essentially just autocomplete with some things wrapped around them for a nicer presentation.

                Wrong.

                They cannot even do symbolic computation, they always have to stay on a concrete literal word level.

                Laughably wrong.

                You cannot get insight or understanding that way.

                Based on what? Your magical definition of insight and understanding?

                You can fake in a way that less smart people are deeply impressed though.

                And in a way that causes other people to overestimate their intelligence, apparently.

                • by gweihir ( 88907 )

                  You really need to get some clue before you shoot your mouth off. Well, that is probably not the way you like to things. Clueless grandstanding seems to be what you do best.

      • But it has not been *constructed* to do logical inference at all... it has been trained to simulate the output by copying (badly) from examples, and then it has been trained to simulate the output students would generate if you ask them to "show your work"

        Like a lazy student you know is always cheating, almost by principle, but you think they're smart enough if you force them to cheat through enough homework and tests they will learn and develop true understanding... despite themselves.

        The above *can* work

        • But it has not been *constructed* to do logical inference at all...

          Incorrect.

          it has been trained to simulate the output by copying (badly) from examples

          Absurdly incorrect.

          and then it has been trained to simulate the output students would generate if you ask them to "show your work"

          Where the fuck are you making this bullshit up from?

          Some researchers seem to think the "latent space" LLMMs develop from training is equivalent to internalized knowledge and reasoning (if not sentience) can "emerge" from generative behavior - and the rest is optimization. But this is controversial and hardly a scientific consensus - it just gets more press.

          Everything about what goes in LLMs is controversial. They're fucking black boxes due to the complexity.
          Sentience should be controversial. If for no other reason that an LLM has atrociously small state that isn't hard-coded, and so can't possibly experience anything like life for more than a unimaginably small fraction of time and space, if it were to do so at all.

          As to whether the latent space is equivalent to internalized k

          • I think you need to calm down and stop watching so much MLST. You've become stuck in a simplified worldview about the inner workings of the brain that was never intended as more than a crude analogy to entice interest in a data fitting field of study called neural networks long ago.

            Mathematically, there's not much going on here other than a humongously large dataset and a worldwide competition to find ways to represent it geometrically, and preferably in linear algebraic operations that happen to be optim

      • by gweihir ( 88907 )

        In other words, you have nothing worthwhile to say, you have no insight into the matter but you are deeply emotionally committed to believe this tech will bring us a bright future, if just all these pesky dumb deniers would simply go away.

        Did I sum that up correctly?

        • but you are deeply emotionally committed to believe this tech will bring us a bright future

          This is a literal strawman. Does constructing strawmen to slay imaginary enemies make you feel clever?
          I never said anything about our future, and in no way do I think AI brings us anything good that outweighs its bad.

          if just all these pesky dumb deniers would simply go away.

          Go away? No need- you can stand still. The world is flying past you so quickly that I imagine your head is spinning.

          Did I sum that up correctly?

          In a way that only a 4 year old trying to argue with an adult could- congratulations ;)

          • by gweihir ( 88907 )

            In a way that only a 4 year old trying to argue with an adult could- congratulations ;)

            Try a professor trying to get through to a not very smart but hugely arrogant student. Of course, I would just fail you if I do not manage to reach you. And yes, I have done that in the past.

    • by gweihir ( 88907 )

      "Provocative name"? You are too kind, What is happening is that they are lying through their teeth to keep the investor money flowing. "AGI" redefined, "reasoning" that is not reasoning, "thinking" that has no resemblance to actual thinking and generally ascribing (always future) capabilities to their crappy technology that would be direly needed, but that is simply does not have and cannot attain.

      Somebody called that a "permanent delivery scam", where it is always the next version that brings the big break

  • by gurps_npc ( 621217 ) on Tuesday January 14, 2025 @08:27PM (#65089441) Homepage

    It was trained on all languages, it doesn't think in any of them, it thinks in all of them.

    It's the equivalent of me and my friend arguing about whether Spock or Gandalf would make a better Jedi. We have been trained in Star Trek, Lord of the Rings and Star Wars, so we think in all of them together.

    When asked a question about something from one of them, we will of course think about the related subjects.

  • but it was hacked by Xi!

  • common (Score:5, Funny)

    by bugs2squash ( 1132591 ) on Tuesday January 14, 2025 @08:48PM (#65089481)
    maybe we all think in chinese, we're just unaware of it if we don't speak the language
  • Some thoughts from the mechanistic interpretability front: I was toying with a few gpt4all models just the other day (including their smaller reasoning model), and injecting random data into the weights and, as the quantity of random data increases, the performance of the models shifted through a few phases when testing at Temperature 0.0 (for reproducibility):

    1. Differentiation: small amount of randomness, the results for simple queries like "What's your name?" changed (ex. "a 19-year old girl" became "a 1

    • by Draconi ( 38078 )

      here's a python script for corrupting GGUFs if you want to try it out :)

      input: "model_path_here"
      output: (the path you gave appended with _2)

      ```
      import os
      import numpy as np
      import shutil
      import sys
      from typing import Optional

      def analyze_and_modify_gguf_file(file_path: str, modify_bytes: int = 128) -> Optional[str]:
      """
      Modifies a GGUF file by slightly adjusting weights in the middle section.
      copies first, the modifies that "_2

      • by Draconi ( 38078 )

        (note: the above is hard-coded for 4-bit quantized models like the GGUFs common in gpt4all)

  • me, a native finnish speaker (poems&texts&stuff), some swedish and vewy, vewy good english, i know. whilst i am watching (the walking dead, lately) i think in english. same when talking in english.
    everything else i think in finnish.
    the WAY one thinks in different languages is,.... well, different. languages have limits and freedoms. like "taivas" in finnish, it is both "heaven" and "sky". then "nuoska", "pyry", "loska" and so forth are all weather related names for "snow".
    in first example finnish la

  • by retiarius ( 72746 ) on Tuesday January 14, 2025 @11:46PM (#65089737)

    ... From national treasure Tom Lehrer's song "Wernher von Braun",
    note the last stanza. (All lyrics are now dedicated to the public domain
    by Tom, himself):

    Lyrics
    And what is it that put America in the forefront of the nuclear nations?
    And what is it that will make it possible to spend twenty billion dollars of your money
    to put some clown on the moon? Well, it was good old American know how, that's what,
    as provided by good old Americans like Dr. Wernher von Braun!
    Gather 'round while I sing you of Wernher von Braun,

    A man whose allegiance
    Is ruled by expedience.
    Call him a Nazi, he won't even frown,
    "Ha, Nazi, Schmazi, " says Wernher von Braun.

    Don't say that he's hypocritical,
    Say rather that he's apolitical.
    "Once the rockets are up, who cares where they come down?
    That's not my department, " says Wernher von Braun.

    Some have harsh words for this man of renown,
    But some think our attitude
    Should be one of gratitude,
    Like the widows and cripples in old London town,
    Who owe their large pensions to Wernher von Braun.

    You too may be a big hero,
    Once you've learned to count backwards to zero.
    "In German oder English I know how to count down,
    Und I'm learning Chinese!" says Wernher von Braun.

  • "For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient."

    Damn you, seven!!!

    • by dohzer ( 867770 )

      Makes me wonder if I've been performing calculations wrong, because the number of syllables have never mattered. "The word for the number 7 has two syllables, so I'll double the answer."

  • ... as bandied about in university philosophy departments for years.

    Only it's the reverse, where messages are passed into the room
    in English, processed in Chinese, then spit back into English.

  • ...don't buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

    Bears repeating. It appears.

  • It is cluelessly bumbling around through the fog, never seeing anything beyond the next step and never even understanding that next step.

    Second, the observed phenomenon nicely shows this is actually fundamental research, because nobody has a clue what this machine does and why. Hence, at the very least 30 years to practical applicability, and probably much linger.

  • These LLMs are essentially implementations of the Chinese room concept.

    • by Epeeist ( 2682 )

      These LLMs are essentially implementations of the Chinese room concept.

      Searle's thought experiment is meant to demonstrate the difference between syntactic and semantic information. While LLMs may work at the syntactic level, they do not operate at the semantic level.

  • OpenAI's AI Reasoning Model 'Thinks' In Chinese Sometimes, No One Really Knows Why

    If a human suddenly switches language, then we can ask them why and they can explain.

    In the case of LLMs, then we don't have that explanation.

  • by LoadLin ( 6193506 ) on Wednesday January 15, 2025 @04:32AM (#65090061)

    Definitely, not all languages think or express the same way.

    Did you have found that you know some term in one language and, when you don't have that word in the language you want to express, you just import the word?

    I'm Spanish. And sometimes you have words like "Crush" (in a romantic sense). When I speak in Spanish I just teasing someone I use that word.

    Well. I did also because I think that person also knows the word, but the thing is... it's a better word that other Spanish expressions like "que... te encoñaste de ella?". It's the same... but sounds rude as the word "coño" is considered a bad word that alone is used in the same way as "pussy".

    We also have "enamorarse", but it means literally "get in love" and that's... not exactly the same when someone is just in the stage of being fascinated and attracted to someone without really understand and know for real that other person.

    When I needed, I just switch language. No problem.

    I think here is the same. If the shorter path to think about something is using a better suited language, because the related ideas are better expressed there, the IA switch.

    Have you tested write to an IA in mixed languages? It understand it without problems, and the response can be very chaotic. Sometimes I just express myself in Spanish and the IA answer in English. Fortunately that's not a problem for me.
    But... yeah... switch to asian languages would be a problem for me. X-D

My mother is a fish. - William Faulkner

Working...