ChatGPT Can Now Respond With Spoken Words

ChatGPT Can Now Respond With Spoken Words (nytimes.com) 39

Posted by msmash on Monday September 25, 2023 @02:01PM from the closer-look dept.

ChatGPT has learned to talk. OpenAI, the San Francisco artificial intelligence start-up, released a version of its popular chatbot on Monday that can interact with people using spoken words. As with Amazon's Alexa, Apple's Siri, and other digital assistants, users can talk to ChatGPT and it will talk back. From a report: For the first time, ChatGPT can also respond to images. People can, for example, upload a photo of the inside of their refrigerator, and the chatbot can give them a list of dishes they could cook with the ingredients they have. "We're looking to make ChatGPT easier to use -- and more helpful," said Peter Deng, OpenAI's vice president of consumer and enterprise product. OpenAI has accelerated the release of its A.I tools in recent weeks. This month, it unveiled a version of its DALL-E image generator and folded the tool into ChatGPT.

ChatGPT attracted hundreds of millions of users after it was introduced in November, and several other companies soon released similar services. With the new version of the bot, OpenAI is pushing beyond rival chatbots like Google Bard, while also competing with older technologies like Alexa and Siri. Alexa and Siri have long provided ways of interacting with smartphones, laptops and other devices through spoken words. But chatbots like ChatGPT and Google Bard have more powerful language skills and are able to instantly write emails, poetry and term papers, and riff on almost any topic tossed their way.

OpenAI has essentially combined the two communication methods. The company sees talking as a more natural way of interacting with its chatbot. It argues that ChatGPT's synthetic voices -- people can choose from five different options, including male and females voices -- are more convincing than others used with popular digital assistants. Over the next two weeks, the company said, the new version of the chatbot would start rolling out to everyone who subscribes to ChatGPT Plus, a service that costs $20 a month. But the bot can respond with voice only when used on iPhones, iPads and Android devices. The bot's synthetic voices are more natural than many others on the market, though they still can sound robotic.

ChatGPT Can Now Respond With Spoken Words

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 39 Comments Log In/Create an Account

Comments Filter:

Where is that wavelet technology? (Score:2)

by nevermindme ( 912672 ) writes:

If AI is any part of the service chain the entire thing is an investment opportunity.
It might not be quite as bad as the 90s tech booms around video compression over modem... but it is close.
Where is that wavelet technology?
- The Cat, I Farted (Score:2)
  
  by stooo ( 2202012 ) writes:
  
  In French, "Chat GPT" litterally translates to : "The Cat, I Farted".
  - Re: (Score:2)
    
    by Motleypuss ( 10291831 ) writes:
    
    Ha! Had I a mod point, you would have it.
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      Don't worry! I have a mod point!
      Oh crap...
  - Re: (Score:3)
    
    by deek ( 22697 ) writes:
    
    Well, not quite literally, but it can translate that way with some imagination and a bit of mischief.
    "Cat" (masc) is "chat" in French, though it's pronounced "sha".
    "I farted" is "j'ai pété", pronounced jay-pay-tay. Which, with a little linguistic juggling, roughly matches the pronunciation of the acronym GPT.
    Anyway, thought people reading this might be interested in exactly how you get from "ChatGPT" to "Cat, I farted".
    - Re: (Score:2)
      
      by stooo ( 2202012 ) writes:
      
      By reading the name in French :)
      - Re: (Score:2)
        
        by deek ( 22697 ) writes:
        
        Hah, of course! Have to pronounce the letters the French way, not English. Damn, fell into that trap.
link is paywalled (Score:2)

by Kristoph ( 242780 ) writes:

Is there a link that doesn’t require me to pay?
- Re:link is paywalled (Score:5, Informative)
  
  by kvezach ( 1199717 ) writes: on Monday September 25, 2023 @02:38PM (#63876023)
  
  I'm not seeing the paywall, but perhaps it's geolocation-based. Here's the text:
  ChatGPT has learned to talk.
  OpenAI, the San Francisco artificial intelligence start-up, released a version of its popular chatbot on Monday that can interact with people using spoken words. As with Amazon’s Alexa, Apple’s Siri, and other digital assistants, users can talk to ChatGPT and it will talk back.
  For the first time, ChatGPT can also respond to images. People can, for example, upload a photo of the inside of their refrigerator, and the chatbot can give them a list of dishes they could cook with the ingredients they have.
  “We’re looking to make ChatGPT easier to use — and more helpful,” said Peter Deng, OpenAI’s vice president of consumer and enterprise product.
  OpenAI has accelerated the release of its A.I tools in recent weeks. This month, it unveiled a version of its DALL-E image generator and folded the tool into ChatGPT.
  ChatGPT attracted hundreds of millions of users after it was introduced in November, and several other companies soon released similar services. With the new version of the bot, OpenAI is pushing beyond rival chatbots like Google Bard, while also competing with older technologies like Alexa and Siri.
  Alexa and Siri have long provided ways of interacting with smartphones, laptops and other devices through spoken words. But chatbots like ChatGPT and Google Bard have more powerful language skills and are able to instantly write emails, poetry and term papers, and riff on almost any topic tossed their way.
  OpenAI has essentially combined the two communication methods.
  The company sees talking as a more natural way of interacting with its chatbot. It argues that ChatGPT’s synthetic voices — people can choose from five different options, including male and females voices — are more convincing than others used with popular digital assistants.
  Over the next two weeks, the company said, the new version of the chatbot would start rolling out to everyone who subscribes to ChatGPT Plus, a service that costs $20 a month. But the bot can respond with voice only when used on iPhones, iPads and Android devices.
  The bot’s synthetic voices are more natural than many others on the market, though they still can sound robotic. Like other digital assistants, it can struggle with homonyms. When The New York Times asked the new ChatGPT how to spell “gym,” it said: “J-I-M.”
  But one of the advantages of a chatbot like ChatGPT is that it can correct itself. When told “No, the other kind of gym,” the bot replied: “Ah, I see what you’re referring to now. The place where people exercise and work out is spelled G-Y-M.”
  Though ChatGPT’s voice interface is reminiscent of earlier assistants, the underlying technology is fundamentally different. ChatGPT is driven primarily by a large language model, or L.L.M., which has learned to generate language on the fly by analyzing huge amounts of text culled from across the internet.
  Older digital assistants, like Alexa and Siri, acted like command-and-control centers that could perform a set number of tasks or give answers to a finite list of questions programmed into their databases, such as “Alexa, turn on the lights” or “What’s the weather in Cupertino?” Adding new commands to the older assistants could take weeks. ChatGPT can respond authoritatively to virtually any question thrown at it in seconds — though it is not always correct.
  As OpenAI is transforming ChatGPT into something more like Alexa or Siri, companies like Amazon and Apple are transforming their digital assistants into something more like ChatGPT.
  Last week, Amazon previewed an updated system for Alexa that aims for more fluid conversation about “any topic.” It is driven in a part by a new L.L.M. and has other upgrades to pacing and intonation
  Read the rest of this comment...
  
- Re: (Score:2)
  
  by Shaitan ( 22585 ) writes:
  
  https://dnyuz.com/2023/09/25/c... [dnyuz.com]
The 1990s called ... (Score:2)

by Calydor ( 739835 ) writes:

I had a cute little freeware program back in the days of Win98 that could read text aloud. It sounded very robotic, yes, but it was only a 50 MB freeware program. Surely this is nothing different? It just reads aloud the written reply it would generate anyway.
- Re: (Score:3, Informative)
  
  by bjoast ( 1310293 ) writes:
  
  This is nothing like your 90s freeware, grandpa.
  - Re: (Score:2)
    
    by Calydor ( 739835 ) writes:
    
    I'm sure it's somewhat more advanced, but at the core of it - all this is is a text-to-speech plugin for ChatGPT. At the other end is speech-to-text. This isn't speech-to-speech, it's speech-to-text, then ChatGPT's general algorithms to generate a reply, and then text-to-speech. It doesn't understand what it's saying any more than it understands what it's writing.
    - Re: (Score:2)
      
      by timeOday ( 582209 ) writes:
      
      Is it definitely not speech-to-speech? GPT4 is inherently multimodal, so going directly from speech to the model embedding rather than through text would seemingly be a natural approach.
  - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    This is nothing like your 90s freeware, grandpa.
    It's a lot closer than you think.
    Yes, transformers have improved text to speech and speech to text, but gluing them onto a chatbot doesn't represent an advance in any way. The chatbot in the middle still gets text as input and still produces text as output.
    This is just a goofy gimmick. If you needed a sign that ChatGPT wasn't living up to its lofty promises, this is it.
  - Re: (Score:2)
    
    by sysrammer ( 446839 ) writes:
    
    Dr. Sbaitso respectfully begs to disagree.
- Re: (Score:2)
  
  by fluffernutter ( 1411889 ) writes:
  
  Have you ever used a Wendy's automated drive thru order desk? They sound very natural.
  - Re: (Score:2)
    
    by Calydor ( 739835 ) writes:
    
    There's no Wendy's in my country.
- Re: (Score:1)
  
  by posixively_true ( 10440450 ) writes:
  
  Good ol' memories of typing swear words in Microsoft Sam when I was an immature little boy. Good times indeed.
Wake me up when it becomes conversational. (Score:2)

by 4wdloop ( 1031398 ) writes:

That is when it learns to listen, react to non-verbal communication, be interrupt-able and read body language, including the use of fingers.
- Re: (Score:3)
  
  by bugs2squash ( 1132591 ) writes:
  
  all 17 of them
- Re: (Score:2)
  
  by jwhyche ( 6192 ) writes:
  
  My aunts parrot would respond with "fuck you too" if you flipped it off.
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  If only you could non-verbally communicate back in time to explain to Alexander Graham Bell why the telephone will never catch on.
Why not browsers? (Score:2)

by Flavianoep ( 1404029 ) writes:

"But the bot can respond with voice only when used on iPhones, iPads and Android devices."
Link is paywalled, so I can't read. Can anyone here explain to a complete noob like me what may be the technical reasons for the bot being not be able to respond to voice on desktops?
- Re: (Score:2)
  
  by omnichad ( 1198475 ) writes:
  
  More access to user data if they force an app on you - just like everyone else.
- Re: (Score:2)
  
  by awwshit ( 6214476 ) writes:
  
  Pushing spyware via an app.
  Keeping the experience consistent across devices.
Please! Please! (Score:2)

by ForkInMe ( 6978200 ) writes:

I just want it to have the option of using the voice of Majel Barrett-Roddenberry.
is the speech generation / recognition integrated (Score:2)

by Kristoph ( 242780 ) writes:

Is the speed generation integrated into the model or is it TTS built into the app? Similar with speech recognition, is it just whisper in the app or is it part of the model.
Given the voice selection I have to believe it’s just TTS in there but curious about the recognition.
Medium as signal, message as noise (Score:1)

by a5y ( 938871 ) writes:

Editors at the New York Times were either bamboozled by what isn't a novel feature or just accomplices in spreading the reach of hypebeast techbro press releases without any scrutiny or filtering for newsworthiness.
That is more interesting to me than a non-story of a non-landmark in the field of taking-suckers-money-and-pretending-to-develop-AI printed in a supposed paper of record (stop snickering, some people still believe) because some innovator piped a text output from a LLM into a software Speak &
I appologize... (Score:2)

by Fons_de_spons ( 1311177 ) writes:

If this thing appologizes once more for an obvious mistake it made...
- Re: (Score:2)
  
  by Motleypuss ( 10291831 ) writes:
  
  Oh my! Now we can burn a bunch of sentences trying to get a computer to understand what we're saying! This is the future! Can we have proper flying cars now? Asking for a friend.
Voice not the big story here (Score:2)

by Tony Isaac ( 1301187 ) writes:

The Bing app has had ChatGPT with voice for a while already.
What *is* interesting is the image thing. Show it am image and get ideas what to make with the items it sees. That sounds actually useful!
- Re: (Score:1)
  
  by bumburumbi ( 1047864 ) writes:
  
  My brother, who is blind, has been using a ChatGPT enabled app for a couple of weeks, he takes a picture with his phone and the app describes what is in the photo. The ability to ask more questions about the photo makes this a very valuable tool for the blind. He has had this read him the menu at a restaurant, identify which knob does what on his guitar amplifier (and even how to reproduce the sound Keith Richards used on a particular record), taken a photo of his living room and had the app locate his keys
  - Re: (Score:2)
    
    by Tony Isaac ( 1301187 ) writes:
    
    That's amazing, and very cool!
    I wasn't trying to suggest that voice isn't important, only that it already existed for ChatGPT.
Not as impressive as pi.ai (Score:2)

by Dominic ( 3849 ) writes:

Not as impressive as pi.ai (which also happens to be completely free).
The voice mode of that app is my go-to when I want to wow someone with how advanced AI is now.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Where is that wavelet technology? (Score:2)

The Cat, I Farted (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

link is paywalled (Score:2)

Re:link is paywalled (Score:5, Informative)

Re: (Score:2)

The 1990s called ... (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Wake me up when it becomes conversational. (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Why not browsers? (Score:2)

Re: (Score:2)

Re: (Score:2)

Please! Please! (Score:2)

is the speech generation / recognition integrated (Score:2)

Medium as signal, message as noise (Score:1)

I appologize... (Score:2)

Re: (Score:2)

Voice not the big story here (Score:2)

Re: (Score:1)

Re: (Score:2)

Not as impressive as pi.ai (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals