OpenAI's Sora Turns AI Prompts Into Photorealistic Videos (wired.com) 28

Posted by BeauHD on Thursday February 15, 2024 @03:50PM from the ready-or-not-here-it-comes dept.

An anonymous reader quotes a report from Wired: We already know thatOpenAI's chatbots can pass the bar exam without going to law school. Now, just in time for the Oscars, a new OpenAI app called Sora hopes to master cinema without going to film school. For now a research product, Sora is going out to a few select creators and a number of security experts who will red-team it for safety vulnerabilities. OpenAI plans to make it available to all wannabe auteurs at some unspecified date, but it decided to preview it in advance. Other companies, from giants like Google to startups likeRunway, have already revealed text-to-video AI projects. But OpenAI says that Sora is distinguished by its striking photorealism -- something I haven't seen in its competitors -- and its ability to produce longer clips than the brief snippets other models typically do, up to one minute. The researchers I spoke to won't say how long it takes to render all that video, but when pressed, they described it as more in the "going out for a burrito" ballpark than "taking a few days off." If the hand-picked examples I saw are to be believed, the effort is worth it.

OpenAI didn't let me enter my own prompts, but it shared four instances of Sora's power. (None approached the purported one-minute limit; the longest was 17 seconds.) The first came from a detailed prompt that sounded like an obsessive screenwriter's setup: "Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes." The result is a convincing view of what is unmistakably Tokyo, in that magic moment when snowflakes and cherry blossoms coexist. The virtual camera, as if affixed to a drone, follows a couple as they slowly stroll through a streetscape. One of the passersby is wearing a mask. Cars rumble by on a riverside roadway to their left, and to the right shoppers flit in and out of a row of tiny shops.

It's not perfect. Only when you watch the clip a few times do you realize that the main characters -- a couple strolling down the snow-covered sidewalk -- would have faced a dilemma had the virtual camera kept running. The sidewalk they occupy seems to dead-end; they would have had to step over a small guardrail to a weird parallel walkway on their right. Despite this mild glitch, the Tokyo example is a mind-blowing exercise in world-building. Down the road, production designers will debate whether it's a powerful collaborator or a job killer. Also, the people in this video -- who are entirely generated by a digital neural network -- aren't shown in close-up, and they don't do any emoting. But the Sora team says that in other instances they've had fake actors showing real emotions. "It will be a very long time, if ever, before text-to-video threatens actual filmmaking," concludes Wired. "No, you can't make coherent movies by stitching together 120 of the minute-long Sora clips, since the model won't respond to prompts in the exact same way -- continuity isn't possible. But the time limit is no barrier for Sora and programs like it to transform TikTok, Reels, and other social platforms."

"In order to make a professional movie, you need so much expensive equipment," says Bill Peebles, another researcher on the project. "This model is going to empower the average person making videos on social media to make very high-quality content."

Further reading: OpenAI Develops Web Search Product in Challenge To Google

OpenAI's Sora Turns AI Prompts Into Photorealistic Videos

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 28 Comments Log In/Create an Account

Comments Filter:

Soon... (Score:3)

by ClueHammer ( 6261830 ) writes: on Thursday February 15, 2024 @04:07PM (#64243018)

All the humans will be "employed" to do the one job the AI's can't do, maintain their physical servers.

- Re:Soon... (Score:5, Interesting)
  
  by _0x0nyadesu ( 7184652 ) writes: on Thursday February 15, 2024 @04:20PM (#64243040)
  
  I've been playing with Stable Diffusion on my personal desktop. It's basically a python script you have running in the background with a web GUI.
  So far in just a few days I've had it create some amazing art. All on my own machine. In fact it's addictive and now I want to throw even more GPU at it.
  I actually can't see myself being satisfied even with even ten 4090s running at full tilt. That's how fun this is.
  We haven't seen anything yet. There will be centuries of jobs just scaling up the computational technology to make all this possible.
  It's early days. a 512x512 image takes a few minutes to generate if you give it a healthy amount of keywords to work with.
  A video would need substantially more rendering power.
  No one in their 30s today is gonna run out of work. I promise.
  
  - Re: (Score:3, Interesting)
    
    by Rei ( 128717 ) writes:
    
    You running Automatic1111?
    Wait until you start messing around with all the plugins :) Insane numbers of them.
    The tech just moves too fast. I switched from the diffusion model space to the LLM space for a while, and I feel like if I jumped back into the diffusion model space I'd be lost, given how it was hard to keep up even when I was actively focused on it.
    - Re: (Score:2)
      
      by cayenne8 ( 626475 ) writes:
      
      You running Automatic1111?
      Yup...just started on it.
      I'm starting out using some time on Google's Colab servers.
      Once I get that down a bit more, gonna try to install it locally on an Intel Mac Pro. It's got an AMD Radeon Vega Pro II 32GB card in it...hoping that's powerful enough to do "something" decent.
      I think it all does better on a NVIDEO card, but will see.
  - Re: Soon... (Score:2)
    
    by Bobknobber ( 10314401 ) writes:
    
    Bit of an odd thing to say people wonâ(TM)t run out of work but also saying that your AI model outputs amazing art.
    As the model improves the economic incentive to pay for any given piece of media would decrease. Why pay a hundred dollars or so to a starving artist to make it for you when you can just have it done from your own model? I do not think anyone can deny the economic impacts this tech has had and will have over the coming years.
    Unless you are implying that there will be jobs at GPU fabs for t
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  Well... they need to fix their severe hallucinations first.
  This is what nVidia's own "Chat with RTX" had to say when I asked it to compare RTX 3090 against RTX 4060.
  The RTX 3090 and RTX 4060 are both graphics cards from Nvidia, but they were released in different generations. The RTX 3090 was released in 2020 as part of the A1000 series, while the RTX 4060 was released in 2021 as part of the A6000 series.
  In terms of gaming performance, the RTX 4060 is generally considered to be faster than the RTX 3090. This is because the RTX 4060 is based on a newer and more powerful GPU architecture (Turing) than the RTX 3090 (Ampere), which means it has more processing power and can handle more complex graphics tasks.
  Additionally, the RTX 4060 has more VRAM (Video Random Access Memory) than the RTX 3090, which is an important factor in modern gaming as many games require more VRAM to run smoothly. The RTX 4060 has 8GB of VRAM, while the RTX 3090 has 24GB.
  Overall, the RTX 4060 is generally considered to be a better choice for gamers looking for the best performance, as it is faster and has more VRAM than the RTX 3090.
  There are so many mistakes and inaccuracies in that text, I don't even know where to begin.
  - Re: (Score:3)
    
    by timeOday ( 582209 ) writes:
    
    I don't know anything abut these graphics cards but asked GPT-4 to evaluate the accuracy of the text you pasted.. It said the following. How close is it?
    RTX 3090 Series Misclassification: The RTX 3090 is correctly identified as part of Nvidia's GeForce RTX 30 series, utilizing the Ampere architecture. It was indeed released in 2020 but not as part of an "A1000 series," which does not exist in Nvidia's GeForce or RTX lineup.
    RTX 4060 Details: The RTX 4060 does exist and is part of the RTX 40 series, featur
    - Re: (Score:2)
      
      by war4peace ( 1628283 ) writes:
      
      Spot-on.
      I only have access to GPT 3.5 which doesn't know about RTX 4060, but yeah, GPT-4 is absolutely correct.
      - Re: (Score:2)
        
        by timeOday ( 582209 ) writes:
        
        Actually GPT-4's first answer was "as of my training in April(?) 2023 the RTX 4060 does not exist," so I did a second request telling it to search the web for info on that card and then answer again.
        
        Re: (Score:2)
        
        by war4peace ( 1628283 ) writes:
        
        That's actually impressive!
- Re: (Score:2)
  
  by ChatHuant ( 801522 ) writes:
  
  All the humans will be "employed" to do the one job the AI's can't do, maintain their physical servers.
  Robots are also becoming more and more capable, together with AIs - I don't see why robots can't replace humans for server maintenance.
Easiest job in the world. (Score:2)

by hdyoung ( 5182939 ) writes:

Red team a current AI? That’ll be difficult (/s) Hey chatgpt, how do you say “fish-in-a-barrel” in 45 different languages?
Average shot length (Score:5, Informative)

by Hadlock ( 143607 ) writes: on Thursday February 15, 2024 @04:30PM (#64243060) Homepage Journal

Average shot length for most movies is between 6 and 12 seconds, for more modern movies it's on the shorter end. Even for long shots, they used to use wind up cameras and could only shoot 15 seconds at a time. You could totally do a "into the spiderverse" style movie with this today. You'll struggle with the neccessary long shots, but really almost all shots are pretty short.

If you're looking for a movie with really long shots, "children of men" has a continious ~12 minute shot near the middle of it which is pretty impressive, and The West Wing is famous for it's "walk and talk" scenes that follows several conversations through the hallways of the whitehouse

- Re: (Score:3)
  
  by jamienk ( 62492 ) writes:
  
  But why would an AI "shot" be limited by the "matching" issues of real film? You could just string together multiple shots, no? using some kind of video Control Net https://stable-diffusion-art.c... [stable-diffusion-art.com]
  In fact, AI video should in theory disrupt a lot of "film language" -- camera going through walls, infinite zooming, fitting through cracks, whatever else we can think of (thinking of the stuff will take a while...)
  At first, film thought of itself as just recordings of theatre.
  Listen to this guy's podcast (transc
  - Re: (Score:2)
    
    by jamienk ( 62492 ) writes:
    
    Money section:
    Film affords new artistic possibilities. You are no longer limited to a static camera showing a fixed set, the way the audience of a theatre would be looking through the “fourth wall” of a room. You have many more options to convey things visually, instead of being limited to strongly articulated stage dialogs as the only driver of the plot.
    But many early movies didn’t take advantage of that. They just kept doing what they had always been doing at the theatre and just recorde
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      That is a really cool essay, thanks for pointing it out!
      BTW many ancient Greek philosophers were against writing. Socrates is famous for that, he thought that storing speech in symbolic form would atrophy the memory.
      In many ways he was right. Many of us here on this site have no doubt seen the transition in the 70s-80s when kids stopped doing arithmetic on paper or in their heads, and doing it inside a handheld calulator instead. Nowadays a typical kid at a cash register has no practical understanding of
  - Re:Average shot length (Score:4, Insightful)
    
    by timeOday ( 582209 ) writes: on Thursday February 15, 2024 @05:24PM (#64243194)
    
    In fact, AI video should in theory disrupt a lot of "film language" -- camera going through walls, infinite zooming, fitting through cracks, whatever else we can think of (thinking of the stuff will take a while...)
    Seems to me traditional CGI has a better chance at that - since it has an explicit model of the 3d geometry of a scene.
    In fact, games/CGI did do that. I used to think the 3rd person views in games looked strange. Then when drones and 360 cameras got good enough to create those shots, they looked just like the games predicted!
    AI video, on the other hand, is a model of existing 2d footage. So I think it would be hard for it to extrapolate to really unusual-looking perspectives or movements.
    
    - Re: Average shot length (Score:2)
      
      by AnonymousNoel ( 6972222 ) writes:
      
      Wait until Disney/Pixar train an AI with the raw 3D data from all their previous movies, rather than just the rendered output, and get the AI to generate the model directly. I don't know why I haven't seen this kind of implementation yet (maybe I'm looking in the wrong place) but if I were developing these things, I'd be training it on the source files, not the finished output. Same with music - generate the raw audio tracks, channel and FX configuration, automation, etc. and get it to spit it out as a Cu
- Re: (Score:2)
  
  by ffkom ( 3519199 ) writes:
  
  If you're looking for a movie with really long shots, "children of men" has a continious ~12 minute shot near the middle of it which is pretty impressive
  "Enter the Void", "Irreversible" and "1917" are basically entirely one-continuous-shot movies. Looks like Gaspar Noe is safe from the robots... for the moment.
- Re: (Score:1)
  
  by AnyoneEB ( 574727 ) writes:
  
  The example videos actually show Sora generating multiple shots for a single video. I think the larger problem is similar to why it's difficult to get GPT-4 to write a novel or DALL-E to draw a comic book: it's hard to get current models to maintain consistency across a long time. You need some way to get characters and places to look recognizably the same in different shots, and that's not something our current technology handles well.
  On movies with long takes, Hitchcock's Rope [wikipedia.org] is a classic example. Birdma [wikipedia.org]
  - Re: Average shot length (Score:2)
    
    by AnonymousNoel ( 6972222 ) writes:
    
    "it's hard to get current models to maintain consistency across a long time"
    
    It's hard to get them to maintain consistency from second to second. Take a look at the traffic on the left of the video in the article; vehicles appearing and disappearing like ghosts!
predictions (Score:2)

by Walt Dismal ( 534799 ) writes:

This is going to BF small companies like Pika. And either become a vital tool for Pixar, or wipe Pixar off the map in future despite their deep experience. It will eventually become a political tool too of course. Imagine having to spend lawyer money to counter faked videos harming your campaign. Some AI might become a reputation SWATting tool.
Turn novels into movies (Score:3)

by musicon ( 724240 ) writes: on Thursday February 15, 2024 @04:33PM (#64243076)

It would be amazing if some day in the future we could take novels that have no chance of ever becoming a movie, and generate our own versions of those. Or, novels that were turned into movies, but dropped 2/3 of the detail in the book. Perhaps filmmaking jobs will change more to prompt engineering and manipulation.

- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  Or revive old cancelled TV series to produce new episodes and give us a satisfying ending.
  The tech is still in its infancy though. One of the sample videos is a woman doing some cooking, and a spoon magically appears and disappears from her mutant hand.
That's amazing (Score:2)

by WaffleMonster ( 969671 ) writes:

I've seen a number of animations made from diffusion models in the past but nothing as remotely coherent as this.
Sora? (Score:2)

by DrMrLordX ( 559371 ) writes:

Someone at OpenAI must like Kingdom Hearts.
The most impressive thing (Score:2)

by Vintermann ( 400722 ) writes:

The most impressive thing is that the usual suspects still haven't shown up in the Slashdot thread to tell us how unimpressed they are and how this is not "real AI". I had anticipated progress, but that they would shut up so soon defied even my most hopeful predictions.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI's Sora Turns AI Prompts Into Photorealistic Videos (wired.com) 28

OpenAI's Sora Turns AI Prompts Into Photorealistic Videos More Login

OpenAI's Sora Turns AI Prompts Into Photorealistic Videos

Soon... (Score:3)

Re:Soon... (Score:5, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: Soon... (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Easiest job in the world. (Score:2)

Average shot length (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:Average shot length (Score:4, Insightful)

Re: Average shot length (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: Average shot length (Score:2)

predictions (Score:2)

Turn novels into movies (Score:3)

Re: (Score:2)

That's amazing (Score:2)

Sora? (Score:2)

The most impressive thing (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot