
Inception Emerges From Stealth With a New Type of AI Model 16
Inception, a Palo Alto-based AI company founded by Stanford professor Stefano Ermon, claims to have developed a novel diffusion-based large language model (DLM) that significantly outperforms traditional LLMs in speed and efficiency. "Inception's model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company," reports TechCrunch. From the report: Ermon hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models. After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a research paper published last year. Recognizing the advancement's potential, Ermon founded Inception last summer, tapping two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company. [...]
"What we found is that our models can leverage the GPUs much more efficiently," Ermon said, referring to the computer chips commonly used to run models in production. "I think this is a big deal. This is going to change the way people build language models." Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less. "Our 'small' coding model is as good as [OpenAI's] GPT-4o mini while more than 10 times as fast," a company spokesperson told TechCrunch. "Our 'mini' model outperforms small open-source models like [Meta's] Llama 3.1 8B and achieves more than 1,000 tokens per second."
"What we found is that our models can leverage the GPUs much more efficiently," Ermon said, referring to the computer chips commonly used to run models in production. "I think this is a big deal. This is going to change the way people build language models." Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less. "Our 'small' coding model is as good as [OpenAI's] GPT-4o mini while more than 10 times as fast," a company spokesperson told TechCrunch. "Our 'mini' model outperforms small open-source models like [Meta's] Llama 3.1 8B and achieves more than 1,000 tokens per second."
10x less?? (Score:4, Insightful)
"while costing 10x less"
Was it an AI who wrote that? Stuff doesn't cost "10x less", they cost "90 % less" or "the cost is one tenth" of something else. Duh!
Race to the commodity bottom (Score:2)
AI appears to be in the hype phase where it's only path to profit is by just being cheaper.
Reducing costs perpetually, contrary to some S&P 500 executive teams, is not a long term sustainable business plan. Same holds for AI.
Predict: A 1,000,000 prompt standardized test suite with rankings per AI model in accuracy, memory/cpu usage, and electrical usage in the near future.
Predict 2: Large Q&A sites for technical things will start to go out of business in 2026. With result that future technologies
Bah Humbug! (Score:3, Funny)
Re: (Score:2, Interesting)
- Slashdot probably
Well, sure. Because so far this AI bubble is mostly unreliable hype.
Image generation models are impressive because there's no "right" and "wrong". There's just "close enough" or "not close enough". But LLMs are exactly that: language models. They're impressive language-parsing tools but their use is often applied to tasks that actually require precision, that's not what they're designed for.
The important next step - I think - is some kind of LFM: Large Fact Model. If we could tokenize facts and tru
Re: (Score:2)
Was that dress blue, or black again?
Re: (Score:3)
A generative model is set up to generate a different answer every time you run it. That's the point. You can make non-generative models with language front ends, that's not a problem. The problem with your "fact model" is figuring out what a fact is.
Since both humans and computers are pretty shit at that, I wouldn't hold my breath.
Re: Bah Humbug! (Score:2)
Re: (Score:2)
The important next step - I think - is some kind of LFM: Large Fact Model. If we could tokenize facts and truths, and use LLMs to interface with those LFMs, that's when this stuff will become reliable.
Facts no longer hold any significance. We need to tokenize bullshit. "He who screams their bullshit the loudest is the most correct" as the weighting system. Then we can have the AI run for office.
It's fast, but still limited to basic tasks (Score:5, Interesting)
I asked it to generate a transformer implementation for DeepSeek R1 and it spits out a whole lot of: // This is a placeholder for the actual implementation
Like other codegen models, it doesn't go much beyond basic common coding tasks. Even for basic tasks in anything but Javascript, the code doesn't compile cleanly.
Negatives aside, it is an interesting thesis, and I like with the direction they're taking. I skimmed the paper, but I think DeepSeek's MoE approach tackles the same weight distribution optimization in a more elegant way. In a nutshell, it's not the CPU or memory that's the limiting factor, it's that attention mechanisms jump around in memory and overloaded the bus IO.
LLM's and Speed Quality and Cost Triangle (Score:2)
Parallel processing (Score:3)
"Ermon and a student of his" (Score:2)
I bet "student of his" worked his arse off while ermon was busy with conferences and mocktails...
Re: "Ermon and a student of his" (Score:2)
Phoar! Student REALLY worked hard in this! This thing is fast!!