Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI

ChatGPT Can Now Respond With Spoken Words (nytimes.com) 39

ChatGPT has learned to talk. OpenAI, the San Francisco artificial intelligence start-up, released a version of its popular chatbot on Monday that can interact with people using spoken words. As with Amazon's Alexa, Apple's Siri, and other digital assistants, users can talk to ChatGPT and it will talk back. From a report: For the first time, ChatGPT can also respond to images. People can, for example, upload a photo of the inside of their refrigerator, and the chatbot can give them a list of dishes they could cook with the ingredients they have. "We're looking to make ChatGPT easier to use -- and more helpful," said Peter Deng, OpenAI's vice president of consumer and enterprise product. OpenAI has accelerated the release of its A.I tools in recent weeks. This month, it unveiled a version of its DALL-E image generator and folded the tool into ChatGPT.

ChatGPT attracted hundreds of millions of users after it was introduced in November, and several other companies soon released similar services. With the new version of the bot, OpenAI is pushing beyond rival chatbots like Google Bard, while also competing with older technologies like Alexa and Siri. Alexa and Siri have long provided ways of interacting with smartphones, laptops and other devices through spoken words. But chatbots like ChatGPT and Google Bard have more powerful language skills and are able to instantly write emails, poetry and term papers, and riff on almost any topic tossed their way.

OpenAI has essentially combined the two communication methods. The company sees talking as a more natural way of interacting with its chatbot. It argues that ChatGPT's synthetic voices -- people can choose from five different options, including male and females voices -- are more convincing than others used with popular digital assistants. Over the next two weeks, the company said, the new version of the chatbot would start rolling out to everyone who subscribes to ChatGPT Plus, a service that costs $20 a month. But the bot can respond with voice only when used on iPhones, iPads and Android devices. The bot's synthetic voices are more natural than many others on the market, though they still can sound robotic.

This discussion has been archived. No new comments can be posted.

ChatGPT Can Now Respond With Spoken Words

Comments Filter:
  • If AI is any part of the service chain the entire thing is an investment opportunity.
    It might not be quite as bad as the 90s tech booms around video compression over modem... but it is close.
    Where is that wavelet technology?
    • In French, "Chat GPT" litterally translates to : "The Cat, I Farted".

      • Ha! Had I a mod point, you would have it.
      • by deek ( 22697 )

        Well, not quite literally, but it can translate that way with some imagination and a bit of mischief.

        "Cat" (masc) is "chat" in French, though it's pronounced "sha".
        "I farted" is "j'ai pété", pronounced jay-pay-tay. Which, with a little linguistic juggling, roughly matches the pronunciation of the acronym GPT.

        Anyway, thought people reading this might be interested in exactly how you get from "ChatGPT" to "Cat, I farted".

  • Is there a link that doesn’t require me to pay?

    • Re:link is paywalled (Score:5, Informative)

      by kvezach ( 1199717 ) on Monday September 25, 2023 @02:38PM (#63876023)
      I'm not seeing the paywall, but perhaps it's geolocation-based. Here's the text:

      ChatGPT has learned to talk.

      OpenAI, the San Francisco artificial intelligence start-up, released a version of its popular chatbot on Monday that can interact with people using spoken words. As with Amazon’s Alexa, Apple’s Siri, and other digital assistants, users can talk to ChatGPT and it will talk back.

      For the first time, ChatGPT can also respond to images. People can, for example, upload a photo of the inside of their refrigerator, and the chatbot can give them a list of dishes they could cook with the ingredients they have.

      “We’re looking to make ChatGPT easier to use — and more helpful,” said Peter Deng, OpenAI’s vice president of consumer and enterprise product.

      OpenAI has accelerated the release of its A.I tools in recent weeks. This month, it unveiled a version of its DALL-E image generator and folded the tool into ChatGPT.

      ChatGPT attracted hundreds of millions of users after it was introduced in November, and several other companies soon released similar services. With the new version of the bot, OpenAI is pushing beyond rival chatbots like Google Bard, while also competing with older technologies like Alexa and Siri.

      Alexa and Siri have long provided ways of interacting with smartphones, laptops and other devices through spoken words. But chatbots like ChatGPT and Google Bard have more powerful language skills and are able to instantly write emails, poetry and term papers, and riff on almost any topic tossed their way.

      OpenAI has essentially combined the two communication methods.

      The company sees talking as a more natural way of interacting with its chatbot. It argues that ChatGPT’s synthetic voices — people can choose from five different options, including male and females voices — are more convincing than others used with popular digital assistants.

      Over the next two weeks, the company said, the new version of the chatbot would start rolling out to everyone who subscribes to ChatGPT Plus, a service that costs $20 a month. But the bot can respond with voice only when used on iPhones, iPads and Android devices.

      The bot’s synthetic voices are more natural than many others on the market, though they still can sound robotic. Like other digital assistants, it can struggle with homonyms. When The New York Times asked the new ChatGPT how to spell “gym,” it said: “J-I-M.”

      But one of the advantages of a chatbot like ChatGPT is that it can correct itself. When told “No, the other kind of gym,” the bot replied: “Ah, I see what you’re referring to now. The place where people exercise and work out is spelled G-Y-M.”

      Though ChatGPT’s voice interface is reminiscent of earlier assistants, the underlying technology is fundamentally different. ChatGPT is driven primarily by a large language model, or L.L.M., which has learned to generate language on the fly by analyzing huge amounts of text culled from across the internet.

      Older digital assistants, like Alexa and Siri, acted like command-and-control centers that could perform a set number of tasks or give answers to a finite list of questions programmed into their databases, such as “Alexa, turn on the lights” or “What’s the weather in Cupertino?” Adding new commands to the older assistants could take weeks. ChatGPT can respond authoritatively to virtually any question thrown at it in seconds — though it is not always correct.

      As OpenAI is transforming ChatGPT into something more like Alexa or Siri, companies like Amazon and Apple are transforming their digital assistants into something more like ChatGPT.

      Last week, Amazon previewed an updated system for Alexa that aims for more fluid conversation about “any topic.” It is driven in a part by a new L.L.M. and has other upgrades to pacing and intonation

  • I had a cute little freeware program back in the days of Win98 that could read text aloud. It sounded very robotic, yes, but it was only a 50 MB freeware program. Surely this is nothing different? It just reads aloud the written reply it would generate anyway.

    • Re: (Score:3, Informative)

      by bjoast ( 1310293 )
      This is nothing like your 90s freeware, grandpa.
      • by Calydor ( 739835 )

        I'm sure it's somewhat more advanced, but at the core of it - all this is is a text-to-speech plugin for ChatGPT. At the other end is speech-to-text. This isn't speech-to-speech, it's speech-to-text, then ChatGPT's general algorithms to generate a reply, and then text-to-speech. It doesn't understand what it's saying any more than it understands what it's writing.

        • Is it definitely not speech-to-speech? GPT4 is inherently multimodal, so going directly from speech to the model embedding rather than through text would seemingly be a natural approach.
      • by narcc ( 412956 )

        This is nothing like your 90s freeware, grandpa.

        It's a lot closer than you think.

        Yes, transformers have improved text to speech and speech to text, but gluing them onto a chatbot doesn't represent an advance in any way. The chatbot in the middle still gets text as input and still produces text as output.

        This is just a goofy gimmick. If you needed a sign that ChatGPT wasn't living up to its lofty promises, this is it.

      • Dr. Sbaitso respectfully begs to disagree.

    • Have you ever used a Wendy's automated drive thru order desk? They sound very natural.
    • Good ol' memories of typing swear words in Microsoft Sam when I was an immature little boy. Good times indeed.
  • That is when it learns to listen, react to non-verbal communication, be interrupt-able and read body language, including the use of fingers.

  • "But the bot can respond with voice only when used on iPhones, iPads and Android devices."

    Link is paywalled, so I can't read. Can anyone here explain to a complete noob like me what may be the technical reasons for the bot being not be able to respond to voice on desktops?

  • I just want it to have the option of using the voice of Majel Barrett-Roddenberry.
  • Is the speed generation integrated into the model or is it TTS built into the app? Similar with speech recognition, is it just whisper in the app or is it part of the model.

    Given the voice selection I have to believe it’s just TTS in there but curious about the recognition.

  • Editors at the New York Times were either bamboozled by what isn't a novel feature or just accomplices in spreading the reach of hypebeast techbro press releases without any scrutiny or filtering for newsworthiness.

    That is more interesting to me than a non-story of a non-landmark in the field of taking-suckers-money-and-pretending-to-develop-AI printed in a supposed paper of record (stop snickering, some people still believe) because some innovator piped a text output from a LLM into a software Speak &

  • If this thing appologizes once more for an obvious mistake it made...
  • The Bing app has had ChatGPT with voice for a while already.

    What *is* interesting is the image thing. Show it am image and get ideas what to make with the items it sees. That sounds actually useful!

    • My brother, who is blind, has been using a ChatGPT enabled app for a couple of weeks, he takes a picture with his phone and the app describes what is in the photo. The ability to ask more questions about the photo makes this a very valuable tool for the blind. He has had this read him the menu at a restaurant, identify which knob does what on his guitar amplifier (and even how to reproduce the sound Keith Richards used on a particular record), taken a photo of his living room and had the app locate his keys
      • That's amazing, and very cool!

        I wasn't trying to suggest that voice isn't important, only that it already existed for ChatGPT.

  • Not as impressive as pi.ai (which also happens to be completely free).

    The voice mode of that app is my go-to when I want to wow someone with how advanced AI is now.

Air pollution is really making us pay through the nose.

Working...