Mistral Releases Pixtral 12B, Its First-Ever Multimodal AI Model 8
Mistral AI has launched Pixtral 12B, its first multimodal model with language and vision processing capabilities, positioning it to compete with AI leaders like OpenAI and Anthropic. You can download its source code from Hugging Face, GitHub, or via a torrent link. VentureBeat reports: While the official details of the new model, including the data it was trained upon, remain under wraps, the core idea appears that Pixtral 12B will allow users to analyze images while combining text prompts with them. So, ideally, one would be able to upload an image or provide a link to one and ask questions about the subjects in the file. The move is a first for Mistral, but it is important to note that multiple other models, including those from competitors like OpenAI and Anthropic, already have image-processing capabilities.
When an X user asked [Sophia Yang, the head of developer relations at the company] what makes the Pixtral 12-billion parameter model unique, she said it will natively support an arbitrary number of images of arbitrary sizes. As shared by initial testers on X, the 24GB model's architecture appears to have 40 layers, 14,336 hidden dimension sizes and 32 attention heads for extensive computational processing. On the vision front, it has a dedicated vision encoder with 1024x1024 image resolution support and 24 hidden layers for advanced image processing. This, however, can change when the company makes it available via API.
When an X user asked [Sophia Yang, the head of developer relations at the company] what makes the Pixtral 12-billion parameter model unique, she said it will natively support an arbitrary number of images of arbitrary sizes. As shared by initial testers on X, the 24GB model's architecture appears to have 40 layers, 14,336 hidden dimension sizes and 32 attention heads for extensive computational processing. On the vision front, it has a dedicated vision encoder with 1024x1024 image resolution support and 24 hidden layers for advanced image processing. This, however, can change when the company makes it available via API.
AI image generation is stupid (Score:3)
You can't edit the images easily. A minimum the AI needs to produce the designs in layers as a PSD-like file.
Re: (Score:2)
I’d rather make simple changes myself than spend time crafting 100 prompts to get the AI to understand exactly what I want, especially since I can do it quicker than prompting for the right edits.
Re: (Score:2)
You can't edit the images easily.
try doing it the other way around, create the image you want the "AI" to "fill in"
Re: (Score:3)
AI image generation doesn't work like that. It takes an existing image, often random noise, and progressively de-noises it to look like the desired prompt. There are no layers there.
However, you can use a tool called Segment Anything [github.com] to find regions that probably should be layers. (From any image, not just an AI image.) Here's a GIMP plugin [github.com] that probably does create those layers.
Re: (Score:2)
The fact that you don't understand that this isn't an image generation model should raise questions for you as to how much you at all understand the topic that you're talking about.
Still a moron machine (Score:2)
Only in more dimensions.
Screwing up the naming convention (Score:2)
They used the x for their MIXture of expert models before, though I understand why they didn't name it pistral.