Nvidia AI Image Generator Fits On a Floppy Disk and Takes 4 Minutes To Train (decrypt.co) 52

Posted by BeauHD on Tuesday August 01, 2023 @11:30PM from the less-is-more dept.

An anonymous reader quotes a report from Decrypt: In the rapidly evolving landscape of AI art creation tools, Nvidia researchers have introduced an innovative new text-to-image personalization method called Perfusion. But it's not a million-dollar super heavyweight model like its competitors. With a size of just 100KB and a 4-minute training time, Perfusion allows significant creative flexibility in portraying personalized concepts while maintaining their identity. Perfusion was presented in a research paper created by Nvidia and the Tel-Aviv University in Israel. Despite its small size, it's able to outperform leading AI art generators like Stability AI's Stable Diffusion v1.5, the newly released Stable Diffusion XL (SDXL), and MidJourney in terms of efficiency of specific editions.

The main new idea in Perfusion is called "Key-Locking." This works by connecting new concepts that a user wants to add, like a specific cat or chair, to a more general category during image generation. For example, the cat would be linked to the broader idea of a "feline." This helps avoid overfitting, which is when the model gets too narrowly tuned to the exact training examples. Overfitting makes it hard for the AI to generate new creative versions of the concept. By tying the new cat to the general notion of a feline, the model can portray the cat in many different poses, appearances, and surroundings. But it still retains the essential "catness" that makes it look like the intended cat, not just any random feline. So in simple terms, Key-Locking lets the AI flexibly portray personalized concepts while keeping their core identity. It's like giving an artist the following directions: "Draw my cat Tom, while sleeping, playing with yarn, and sniffing flowers."

Perfusion also enables multiple personalized concepts to be combined in a single image with natural interactions, unlike existing tools that learn concepts in isolation. Users can guide the image creation process through text prompts, merging concepts like a specific cat and chair. Perfusion offers a remarkable feature that lets users control the balance between visual fidelity (the image) and textual alignment (the prompt) during inference by adjusting a single 100KB model. This capability allows users to easily explore the Pareto front (text similarity vs image similarity) and select the optimal trade-off that suits their specific needs, all without the necessity of retraining. It's important to note that training a model requires some finesse. Focusing on reproducing the model too much leads to the model producing the same output over and over again and making it follow the prompt too closely with no freedom usually produces a bad result. The flexibility to tune how close the generator gets to the prompt is an important piece of customization

Nvidia AI Image Generator Fits On a Floppy Disk and Takes 4 Minutes To Train

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 52 Comments Log In/Create an Account

Comments Filter:

If it fits on a floppy disk (Score:5, Funny)

by Z00L00K ( 682162 ) writes: on Tuesday August 01, 2023 @11:33PM (#63732906) Homepage Journal

Then I'll have to dig up a floppy disk drive to make this work.

- Re: (Score:1)
  
  by angel'o'sphere ( 80593 ) writes:
  
  And probably a fancy USB connector (they exist).
- Re:If it fits on a floppy disk (Score:4, Interesting)
  
  by LordHighExecutioner ( 4245243 ) writes: on Wednesday August 02, 2023 @03:17AM (#63733182)
  
  It is 100 kB of code, so a 8" floppy disk driver should be enough...
  
  - Re: (Score:2)
    
    by Shipwack ( 684009 ) writes:
    
    I have a box of these if you need one...
- Re: (Score:3)
  
  by RogueWarrior65 ( 678876 ) writes:
  
  I found an old 5 1/4" floppy disk the other day. I think I'll send it to Nvidia.
Fantastic! (Score:3)

by plate_o_shrimp ( 948271 ) writes: on Tuesday August 01, 2023 @11:37PM (#63732910)

Where the fsck am I gonna find a floppy disk, not to mention a drive..

- Re:Fantastic! (Score:4, Funny)
  
  by kmoser ( 1469707 ) writes: on Wednesday August 02, 2023 @12:00AM (#63732940)
  
  You'll have to 3D print the "Save" icon.
  
  - Re: (Score:2)
    
    by war4peace ( 1628283 ) writes:
    
    On it...
  - Re: (Score:2)
    
    by quonset ( 4839537 ) writes:
    
    You'll have to 3D print the "Save" icon.
    You mean a vending machine with a beverage dispensed [ctvnews.ca].
    - Re: (Score:2)
      
      by Bitmanhome ( 254112 ) writes:
      
      Awesome, thanks for that.
- Re: (Score:2)
  
  by mspohr ( 589790 ) writes:
  
  Nvidia has only presented the research paper for now, promising to release the code soon.
Yeah sure (Score:3, Insightful)

by anoncoward69 ( 6496862 ) writes: on Tuesday August 01, 2023 @11:57PM (#63732930)

What fits on a floppy is probably some local client to run on your machine that hits up some API in the cloud with terrabytes of imagary to use for the image generation.

- Re: (Score:2)
  
  by Falos ( 2905315 ) writes:
  
  So it's about as bloaty as usual - 99% waste, a hundred times bigger than it needs to be.
  Rube Goldberg machine fits on ONE DOMINO you won't BELIEVE what happens next
- Re: (Score:2)
  
  by ufgrat ( 6245202 ) writes:
  
  What fits on the floppy, according to the article, is the training model, which could be as small as 100kb.
brain breakage (Score:3)

by CAIMLAS ( 41445 ) writes: on Tuesday August 01, 2023 @11:58PM (#63732932)

The title of this article, while genius, broke the brain of everyone over 30...

- Re: (Score:2)
  
  by DontBeAMoran ( 4843879 ) writes:
  
  Tell me about it. If it had been "Nvidia AI image generator fits on a floppy disk and takes 4 minutes to plane", I would've understood immediately!
  On the other hand, everyone under 30 is wondering what the fuck a "floppy disk" is.
- Re: (Score:2)
  
  by iMadeGhostzilla ( 1851560 ) writes:
  
  What breaks MY brain is, cool and all, but what is it for? Are there real world uses for generating (dream-like, bizarre) images from text? The only use case I've seen of those so far is sharing them on social media.
That's Awesome (Score:1)

by Your Anus ( 308149 ) writes:

Just need to find a floppy disk
- Re: That's Awesome (Score:2)
  
  by Anonymouse Cowtard ( 6211666 ) writes:
  
  It's funny how fast we post a comment alleging DUPE article, but we care not for DUPE comments.
  - Re: That's Awesome (Score:4, Insightful)
    
    by Joce640k ( 829181 ) writes: on Wednesday August 02, 2023 @02:05AM (#63733102) Homepage
    
    It can take a while to type a comment when you're as old as the people who still remember floppy disks.
    
  - Re: (Score:2)
    
    by spudnic ( 32107 ) writes:
    
    And it's not our job to check for dupes before posting headlines.
- Re: That's Awesome (Score:1)
  
  by BoloMKXXVIII ( 10309097 ) writes:
  
  Ask Boeing. I understand they still use them.
- Re: (Score:2)
  
  by mspohr ( 589790 ) writes:
  
  I have lots of floppy disks... 8", 5", 3.5" (and even have floppy drives in "storage" under the house.
  Yes, I'm old.
? somethinkg's missing here (Score:5, Insightful)

by John Cavendish ( 6659408 ) writes: on Wednesday August 02, 2023 @12:30AM (#63732976)

100kB of code and 4 minutes to train a diffusion text to image generator? Something's missing in the description - seems like it requires access to existing trained NNs otherwise it's almost a perpetuum mobile in the software engineering world.

- Re:? somethinkg's missing here (Score:5, Informative)
  
  by omnichad ( 1198475 ) writes: on Wednesday August 02, 2023 @01:03AM (#63733010) Homepage
  
  The summary says it:
  text-to-image personalization method
  The headline is just plain wrong. The model can't generate images on its own. It's a mini model to tune and customize the output of the main model.
  For example in advertising you could train the model with your product images and names and then reference then in your prompt without retraining the big model.
  
  - Re: (Score:2)
    
    by Pinky's Brain ( 1158667 ) writes:
    
    It's weird they don't mention parameter efficient finetuning any where in the paper, it's coined for pure language models but it does capture the essence of these kinds of approaches nicely.
    - Re: (Score:2)
      
      by null etc. ( 524767 ) writes:
      
      "In Perfusion, we lock the K pathway to the concept’s supercategory and use gated rank-1 editing instead of finetuning and subsequent optimization."
      Also, regarding 100KB: "It further enables inference-time combinations of concepts, and it has a small model size — roughly 100KB per concept." I wonder how many concepts will be needed for a typical, practical application.
Hmmm (Score:2)

by NoWayNoShapeNoForm ( 7060585 ) writes:

Like the old saying goes: If it sounds too good to be true...it probably is.
Real question... (Score:2)

by TwistedGreen ( 80055 ) writes:

But will it run on a 386?
- Re: (Score:3)
  
  by Joce640k ( 829181 ) writes:
  
  They recommend a 486DX2 with local bus graphics.
  - Re: (Score:2)
    
    by null etc. ( 524767 ) writes:
    
    Don't forget to enable turbo mode.
    - Re: (Score:2)
      
      by Anne Thwacks ( 531696 ) writes:
      
      Don't forget to enable turbo mode.
      Nobody ever disables turbo mode except to cheat at games.
      - Re: (Score:1)
        
        by Shaitan ( 22585 ) writes:
        
        False. Had to disable and further underclock a 486 once because it screwed up the delay loops in a ridiculously overpriced app that was written to control a dairy farm's milking machine.
"Add concept" and "Combine concepts" for SD (Score:2)

by Visarga ( 1071662 ) writes:

It's a tool on top of diffusion models to implement new concepts. Like if you want to generate a photoshoot, you need the product as a concept so it is generated correctly.

> Empirically, Perfusion not only leads to more accurate personalization at a fraction of the model size, but it also enables the use of more complex prompts and the combination of individually-learned concepts at inference time
- Re: (Score:1)
  
  by freedom4us ( 1828474 ) writes:
  
  Surely the title is a clickbait. But may be it sparks some interest in more efficient coding?
100k? (Score:5, Insightful)

by narcc ( 412956 ) writes: on Wednesday August 02, 2023 @03:35AM (#63733196) Journal

Like all AI articles, this one is purposefully misleading. No, a tiny 100k model does not "out-perform" massive models like Stable Diffusion. The qualifier the summary uses is "in terms of efficiency of specific editions", after listing multiple variations, is clearly intended to mislead!
In this case, the 100k is "bolted on" to an existing model. Shocking, I know. If you're familiar with LoRA you've got the basic idea, though they're using a variation of Rank-One Model Editing (ROME), which doesn't work the same way at all. All the magic is happening in the cross-attention mechanism.
So what are they doing? The idea here is to be able to quickly, inexpensively, and selectively customize a model using a few specific examples. If you want the model generate images using a specific piece of furniture, for example, this approach would let you do that.
So where's the novelty? They introduce the concept of "key-locking". If an encoding contains the target concept, keys will match those of the super category (which is just a single word) in the cross-attention mechanism while the values remain specific. It feels very much like hijacking. (All [whatsits] look like this now.) The other innovation is a non-linear gating mechanism that allows us to control the influence of our changes as well as combine multiple changes.

- - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    I'm a big believer in formal education. If you can manage it, poke around for an inexpensive grad cert. If that's not an option or you don't want to make that kind of investment, try going through some of the undergraduate textbooks. That's what they're for, after all.
    You could also try online classes on sites like udemy, but I don't know what kind of quality you'll find there.
    The trouble with learning independently is that you tend to overestimate the importance of topics you understand (or think you un
Floppy? (Score:1)

by henryc999 ( 9388475 ) writes:

Floppy disk? I only have a stiffy left, what am I supposed to do with all that extra?
- Re: (Score:1)
  
  by Motleypuss ( 10291831 ) writes:
  
  I miss stiffies. Felt so good to insert them into my Commodore Amiga and IBM PC. I haven't seen a stiffy for many years. Or a CD-ROM, for that matter. My current computer is Flash Disks and Internet connectivity the whole way. Sad times.
  - Re: (Score:1)
    
    by Shaitan ( 22585 ) writes:
    
    Those IBM PC's have taken more stiffies than a $2 hooker camping a strip club.
how many libraries of congress is that? (Score:2)

by youn ( 1516637 ) writes:

does anyone still have a floppy, a cd? heck, many laptops don't even any removable device, even sds
really annoying trend if you ask me (and yes, I am aware of thumb drives)
- Re: (Score:1)
  
  by Motleypuss ( 10291831 ) writes:
  
  Yeah, some of the older technologies really needed to stick around. I don't want to faff around with rotating a USB connector two or three times to get a mating (heh), when I could just jam a 3.5 inch disk into a drive and have it work first time (barring track seek errors).
  - Re: (Score:1)
    
    by Shaitan ( 22585 ) writes:
    
    I'd just carry a set of blanks and a CD with an assortment of floppy images. USB are much better than they were but their dynamic nature makes them a PITA to reliably boot from, so in that sense I actually do miss floppies and being able to consistently keep the drive higher in the boot order.
What's a.... (Score:3)

by craighansen ( 744648 ) writes: on Wednesday August 02, 2023 @08:51AM (#63733602) Journal

...floppy disk?
https://www.extremetech.com/ex... [extremetech.com]

- Re: (Score:2)
  
  by Waccoon ( 1186667 ) writes:
  
  Not as good as a floppy. The write protect tab doesn't work.
  God, do I miss being able to easily and reliably write-protect removable media. All the people constantly screaming about security never seem to care about that.
Misleading image comparisons (Score:1)

by Anonymous Coward writes:

The thing I find misleading is posting image comparisons of your model or model customization vs something else. Each time you run it something else pops out that can either be amazing or a total dud. For valid comparisons a sequence of dozens of images across without cherry picking or prompt engineering is needed to actually communicate a benefit.
Was it written in assembler? (Score:2)

by RogueWarrior65 ( 678876 ) writes:

Hell, people used to be able to make some pretty impressive programs when they worked in pure assembly language that fit into 16K.
- Re: (Score:1)
  
  by Shaitan ( 22585 ) writes:
  
  Yup, people say the compiler does a better job of optimizing than people can do by hand anyway. Also that macro optimizing provides better performance than micro. But you know what beats either? Doing both.
  I'm sure there is some truth to it but these claims just make me think of the horrifically bloated automatically generated output from frontpage, dreamweaver, publisher, etc. Yeah, it might have reached "good enough" that the benefit is almost universally not worth spending the extra time on but no automa

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

If it fits on a floppy disk (Score:5, Funny)

Re: (Score:1)

Re:If it fits on a floppy disk (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Fantastic! (Score:3)

Re:Fantastic! (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Yeah sure (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

brain breakage (Score:3)

Re: (Score:2)

Re: (Score:2)

That's Awesome (Score:1)

Re: That's Awesome (Score:2)

Re: That's Awesome (Score:4, Insightful)

Re: (Score:2)

Re: That's Awesome (Score:1)

Re: (Score:2)

? somethinkg's missing here (Score:5, Insightful)

Re:? somethinkg's missing here (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Hmmm (Score:2)

Real question... (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

"Add concept" and "Combine concepts" for SD (Score:2)

Re: (Score:1)

100k? (Score:5, Insightful)

Re: (Score:2)

Floppy? (Score:1)

Re: (Score:1)

Re: (Score:1)

how many libraries of congress is that? (Score:2)

Re: (Score:1)

Re: (Score:1)

What's a.... (Score:3)

Re: (Score:2)

Misleading image comparisons (Score:1)

Was it written in assembler? (Score:2)

Re: (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals