New Nvidia AI Agent, Powered by GPT-4, Can Train Robots (venturebeat.com) 12
Nvidia Research announced today that it has developed a new AI agent, called Eureka, that is powered by OpenAI's GPT-4 and can autonomously teach robots complex skills. From a report: In a blog post, the company said Eureka, which autonomously writes reward algorithms, has, for the first time, trained a robotic hand to perform rapid pen-spinning tricks as well as a human can. Eureka has also taught robots to open drawers and cabinets, toss and catch balls, and manipulate scissors, among nearly 30 tasks.
"Reinforcement learning has enabled impressive wins over the last decade, yet many challenges still exist, such as reward design, which remains a trial-and-error process," Anima Anandkumar, senior director of AI research at Nvidia and an author of the Eureka paper, said in the blog post. "Eureka is a first step toward developing new algorithms that integrate generative and reinforcement learning methods to solve hard tasks."
"Reinforcement learning has enabled impressive wins over the last decade, yet many challenges still exist, such as reward design, which remains a trial-and-error process," Anima Anandkumar, senior director of AI research at Nvidia and an author of the Eureka paper, said in the blog post. "Eureka is a first step toward developing new algorithms that integrate generative and reinforcement learning methods to solve hard tasks."
So... (Score:2)
such bullshit (Score:2)
Using the term 'reward' implies that the robots have wants and/or needs, that the robots expect something in return for something they did. Software has no such desires. What does it even mean to reward a robot?
Re: (Score:2)
For the full deep dive, watch these lectures: https://rail.eecs.berkeley.edu... [berkeley.edu]
Re: (Score:2)
Fair enough, thanks for the links. I'm feeling like we really need to stop using human terms with software. What your video link calls rewards are really a measure of progress towards a goal and I still think reward is the wrong term to use. Software does not have wants and needs and is not motivated by rewards. Intelligence does not have to be human.
Re: (Score:2)
It's pretty straight forward.
Value = This is good
Value = This is bad.
You've just created a reward system. The simple way of looking at it is you enter the desired goal, and if the AI gets closer to it, you say this is good. It marks it's attempt as good and analyzes what it did. As it adjusts its parameters, eventually it starts to figure out what combinations increase the good score (reward) and what combinations detract from it, slowly isolated specific things it does that subtract, and which things add.
S
Re: (Score:2)
This Will Come In Handy (Score:2)
Not like Skynet at all. nooooo. Asimov rule? (Score:2)
Re: (Score:2)
Several of Asimov's stories were about how the 'Three Laws' were inadequate and robots could work around them.
Re: (Score:2)
Re: (Score:2)