
Ant Developing Large Language Model Technology (bloomberg.com) 7
Jack Ma-backed Ant Group is developing large-language model technology that will power ChatGPT-style services, joining a list of Chinese companies seeking to win an edge in next-generation artificial intelligence. From a report: The project known as "Zhen Yi" is being created by a dedicated unit and will deploy in-house research. An Ant spokesperson confirmed the news which was first reported by Chinastarmarket.cn Ant is racing against companies including its affiliate Alibaba Group Holding Ltd., Baidu and SenseTime. Their efforts mirror developments in US where Alphabet's Google and Microsoft are exploring generative AI, which can create original content from poetry to art just with simple user prompts.
Disappointment (Score:4, Funny)
Re: (Score:2)
Same. I thought our insectoid friends were finally breaking free of their oppressive caste system.
Re: (Score:2)
Me too! I was like huh? :O
Re: (Score:1)
Soon, everyone will have AI (Score:5, Interesting)
Gimp 3 is under development, but not available in the installation channels yet.
You can get a copy of the Gimp 3 development version as a flatpak from the website, then install GIMP-ML [github.io] from github and add the "addons" directory of wherever you installed GIMP-ML to the GImp3 preferences and this adds 16 AI functions to Gimp: things like semantic segmentation, inpainting, face recognition, and so on. You will also need to download the large model weights.
It takes about 4 GB of downloads for everything (depending on what you might already have), and getting everything to work is not completely trivial yet. Gimp3 is a moving target, and you will need to edit some of the python files and maybe move some things around, but any technically capable person should be able to figure it out.
Once Gimp3 is released I expect this add-on will be very popular. AI will be everywhere.
One criticism I have is with the quality. The Adobe photoshop generative fill [youtube.com] demos are very good, but I haven't been able to reproduce that quality with a local copy of stable diffusion. I suspect that the open source model weights are poor quality, and that Adobe has pruned, tuned, and enhanced their version for higher quality.
Generating a LLM is expensive. I saw one estimate of $75 million to generate a large model, and then once you have the model anyone can "tune" it with specific data (LORA tuning) to make "in the style of" whatever you tuned it with. Once you have the large model, tuning is cheap: a laptop with a beefy video chip can do this in a weekend.
The LLAMA model (the LLM made by Meta/Facebook) leaked online in the beginning of March, and in the intervening 2 months the open source crowd made tremendous improvements in the technology; imagine something like 5 years of improvements [semianalysis.com] in 3 months. I think LLAMA was a language model and not a RNN for images, which is why locally run stable diffusion isn't as good as the professional versions.
I strongly suspect that by the end of this year some very high quality large AI models will be available to everyone: leaked or released.
Everyone and their dog will be making AI creations of image, video, and audio.
(And note with only a little bit of cynicism: just in time for the 2024 presidential election.)
I love the fact... (Score:5, Interesting)
... that anyone can do this. Maybe not with GPT-4 quality, but with the right tools and learning curve anyone can make their own models. I'm currently making a very lightweight (7B) summarizer model that takes long chunks of text and condenses it down to a specified target length (the goal is to be able to condense long contexts into their key points which will can fit into the token limits of lightweight, user-runnable models).
I get a giggle at watching long vacuuous screeds get condensed down into just a sentence or two. For example:
Re: (Score:2)
To be clear, I haven't started the training yet, I'm just generating the training dataset... doing that with a 30B model, because you always want your training source to be a better model than what you're training. There's three parts to the training dataset:
* Good examples of accurate condensation to a specified length
* Accurate summaries, but wrong lengths
* Right lengths, but inaccurate summaries / bad takes.
Since even the 30B model isn't capable of targeting a specified length, I