Best FLAN-T5 Alternatives in 2025
Find the top alternatives to FLAN-T5 currently available. Compare ratings, reviews, pricing, and features of FLAN-T5 alternatives in 2025. Slashdot lists the best FLAN-T5 alternatives on the market that offer competing products that are similar to FLAN-T5. Sort through FLAN-T5 alternatives below to make the best choice for your needs
-
1
Mistral 7B
Mistral AI
FreeMistral 7B is a cutting-edge 7.3-billion-parameter language model designed to deliver superior performance, surpassing larger models like Llama 2 13B on multiple benchmarks. It leverages Grouped-Query Attention (GQA) for optimized inference speed and Sliding Window Attention (SWA) to effectively process longer text sequences. Released under the Apache 2.0 license, Mistral 7B is openly available for deployment across a wide range of environments, from local systems to major cloud platforms. Additionally, its fine-tuned variant, Mistral 7B Instruct, excels in instruction-following tasks, outperforming models such as Llama 2 13B Chat in guided responses and AI-assisted applications. -
2
T5
Google
With T5, we propose re-framing all NLP into a unified format where the input and the output are always text strings. This is in contrast to BERT models which can only output a class label, or a span from the input. Our text-totext framework allows us use the same model and loss function on any NLP task. This includes machine translation, document summary, question answering and classification tasks. We can also apply T5 to regression by training it to predict a string representation of a numeric value instead of the actual number. -
3
Alpaca
Stanford Center for Research on Foundation Models (CRFM)
Instruction-following models such as GPT-3.5 (text-DaVinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. These models are now used by many users, and some even for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language. It is vital that the academic community engages in order to make maximum progress towards addressing these pressing issues. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI's text-DaVinci-003. We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta's LLaMA 7B model. -
4
Llama 3.3
Meta
FreeLlama 3.3, the latest in the Llama language model series, was developed to push the limits of AI-powered communication and understanding. Llama 3.3, with its enhanced contextual reasoning, improved generation of language, and advanced fine tuning capabilities, is designed to deliver highly accurate responses across diverse applications. This version has a larger dataset for training, refined algorithms to improve nuanced understanding, and reduced biases as compared to previous versions. Llama 3.3 excels at tasks such as multilingual communication, technical explanations, creative writing and natural language understanding. It is an indispensable tool for researchers, developers and businesses. Its modular architecture enables customization in specialized domains and ensures performance at scale. -
5
Phi-2
Microsoft
Phi-2 is a 2.7-billion-parameter language-model that shows outstanding reasoning and language-understanding capabilities. It represents the state-of-the art performance among language-base models with less than thirteen billion parameters. Phi-2 can match or even outperform models 25x larger on complex benchmarks, thanks to innovations in model scaling. Phi-2's compact size makes it an ideal playground for researchers. It can be used for exploring mechanistic interpretationability, safety improvements or fine-tuning experiments on a variety tasks. We have included Phi-2 in the Azure AI Studio catalog to encourage research and development of language models. -
6
Llama 2
Meta
FreeThe next generation of the large language model. This release includes modelweights and starting code to pretrained and fine tuned Llama languages models, ranging from 7B-70B parameters. Llama 1 models have a context length of 2 trillion tokens. Llama 2 models have a context length double that of Llama 1. The fine-tuned Llama 2 models have been trained using over 1,000,000 human annotations. Llama 2, a new open-source language model, outperforms many other open-source language models in external benchmarks. These include tests of reasoning, coding and proficiency, as well as knowledge tests. Llama 2 has been pre-trained using publicly available online data sources. Llama-2 chat, a fine-tuned version of the model, is based on publicly available instruction datasets, and more than 1 million human annotations. We have a wide range of supporters in the world who are committed to our open approach for today's AI. These companies have provided early feedback and have expressed excitement to build with Llama 2 -
7
Gemini 1.5 Pro
Google
1 RatingThe Gemini 1.5 Pro AI Model is a state of the art language model that delivers highly accurate, context aware, and human like responses across a wide range of applications. It excels at natural language understanding, generation and reasoning tasks. The model has been fine-tuned to support tasks such as content creation, code-generation, data analysis, or complex problem-solving. Its advanced algorithms allow it to adapt seamlessly to different domains, conversational styles and languages. The Gemini 1.5 Pro, with its focus on scalability, is designed for both small-scale and enterprise-level implementations. It is a powerful tool to enhance productivity and innovation. -
8
Smaug-72B
Abacus
FreeSmaug 72B is an open-source large-language model (LLM), which is known for its key features. High Performance: It is currently ranked first on the Hugging face Open LLM leaderboard. This model has surpassed models such as GPT-3.5 across a range of benchmarks. This means that it excels in tasks such as understanding, responding to and generating text similar to human speech. Open Source: Smaug-72B, unlike many other advanced LLMs is available to anyone for free use and modification, fostering collaboration, innovation, and creativity in the AI community. Focus on Math and Reasoning: It excels at handling mathematical and reasoning tasks. This is attributed to the unique fine-tuning technologies developed by Abacus, the creators Smaug 72B. Based on Qwen 72B: This is a finely tuned version of another powerful LLM, called Qwen 72B, released by Alibaba. It further improves its capabilities. Smaug-72B is a significant advance in open-source AI. -
9
Teuken 7B
OpenGPT-X
FreeTeuken-7B, a multilingual open source language model, was developed under the OpenGPT-X project. It is specifically designed to accommodate Europe's diverse linguistic landscape. It was trained on a dataset that included over 50% non-English text, covering all 24 official European Union languages, to ensure robust performance. Teuken-7B's custom multilingual tokenizer is a key innovation. It has been optimized for European languages and enhances training efficiency. The model comes in two versions: Teuken-7B Base, a pre-trained foundational model, and Teuken-7B Instruct, a model that has been tuned to better follow user prompts. Hugging Face makes both versions available, promoting transparency and cooperation within the AI community. The development of Teuken-7B demonstrates a commitment to create AI models that reflect Europe’s diversity. -
10
BLOOM
BigScience
BLOOM (autoregressive large language model) is trained to continue text using a prompt on large amounts of text data. It uses industrial-scale computational resources. It can produce coherent text in 46 languages and 13 programming language, which is almost impossible to distinguish from text written by humans. BLOOM can be trained to perform text tasks that it hasn’t been explicitly trained for by casting them as text generation jobs. -
11
GPT-4 (Generative Pretrained Transformer 4) a large-scale, unsupervised language model that is yet to be released. GPT-4, which is the successor of GPT-3, is part of the GPT -n series of natural-language processing models. It was trained using a dataset of 45TB text to produce text generation and understanding abilities that are human-like. GPT-4 is not dependent on additional training data, unlike other NLP models. It can generate text and answer questions using its own context. GPT-4 has been demonstrated to be capable of performing a wide range of tasks without any task-specific training data, such as translation, summarization and sentiment analysis.
-
12
LLaVA
LLaVA
FreeLLaVA is a multimodal model that combines a Vicuna language model with a vision encoder to facilitate comprehensive visual-language understanding. LLaVA's chat capabilities are impressive, emulating multimodal functionality of models such as GPT-4. LLaVA 1.5 has achieved the best performance in 11 benchmarks using publicly available data. It completed training on a single 8A100 node in about one day, beating methods that rely upon billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been crucial in training LLaVA for a wide range of visual and linguistic tasks. -
13
Llama 3.2
Meta
FreeThere are now more versions of the open-source AI model that you can refine, distill and deploy anywhere. Choose from 1B or 3B, or build with Llama 3. Llama 3.2 consists of a collection large language models (LLMs), which are pre-trained and fine-tuned. They come in sizes 1B and 3B, which are multilingual text only. Sizes 11B and 90B accept both text and images as inputs and produce text. Our latest release allows you to create highly efficient and performant applications. Use our 1B and 3B models to develop on-device applications, such as a summary of a conversation from your phone, or calling on-device features like calendar. Use our 11B and 90B models to transform an existing image or get more information from a picture of your surroundings. -
14
Tülu 3
Ai2
FreeTülu 3 is a cutting-edge instruction-following language model created by the Allen Institute for AI (AI2), designed to enhance reasoning, coding, mathematics, knowledge retrieval, and safety. Built on the Llama 3 Base model, Tülu 3 undergoes a four-stage post-training process that includes curated prompt synthesis, supervised fine-tuning, preference tuning with diverse datasets, and reinforcement learning to improve targeted skills with verifiable results. As an open-source model, it prioritizes transparency by providing access to training data, evaluation tools, and code, bridging the gap between open and proprietary AI fine-tuning techniques. Performance evaluations demonstrate that Tülu 3 surpasses other similarly sized open-weight models, including Llama 3.1-Instruct and Qwen2.5-Instruct, across multiple benchmarks. -
15
Code Llama
Meta
FreeCode Llama, a large-language model (LLM), can generate code using text prompts. Code Llama, the most advanced publicly available LLM for code tasks, has the potential to improve workflows for developers and reduce the barrier for those learning to code. Code Llama can be used to improve productivity and educate programmers to create more robust, well documented software. Code Llama, a state-of the-art LLM, is capable of generating both code, and natural languages about code, based on both code and natural-language prompts. Code Llama can be used for free in research and commercial purposes. Code Llama is a new model that is built on Llama 2. It is available in 3 models: Code Llama is the foundational model of code; Codel Llama is a Python-specific language. Code Llama-Instruct is a finely tuned natural language instruction interpreter. -
16
Yi-Lightning
Yi-Lightning
Yi-Lightning is the latest large language model developed by 01.AI, under the leadership Kai-Fu Lee. It focuses on high performance, cost-efficiency, and a wide range of languages. It has a maximum context of 16K tokens, and costs $0.14 per million tokens both for input and output. This makes it very competitive. Yi-Lightning uses an enhanced Mixture-of-Experts architecture that incorporates fine-grained expert segments and advanced routing strategies to improve its efficiency. This model has excelled across a variety of domains. It achieved top rankings in categories such as Chinese, math, coding and hard prompts in the chatbot arena where it secured the sixth position overall and ninth in style control. Its development included pre-training, supervised tuning, and reinforcement learning based on human feedback. This ensured both performance and safety with optimizations for memory usage and inference speeds. -
17
LongLLaMA
LongLLaMA
FreeThis repository contains a research preview of LongLLaMA. It is a large language-model capable of handling contexts up to 256k tokens. LongLLaMA was built on the foundation of OpenLLaMA, and fine-tuned with the Focused Transformer method. LongLLaMA code was built on the foundation of Code Llama. We release a smaller base variant of the LongLLaMA (not instruction-tuned) on a permissive licence (Apache 2.0), and inference code that supports longer contexts for hugging face. Our model weights are a drop-in replacement for LLaMA (for short contexts up to 2048 tokens) in existing implementations. We also provide evaluation results, and comparisons with the original OpenLLaMA model. -
18
Ai2 OLMoE
The Allen Institute for Artificial Intelligence
FreeAi2 OLMoE, an open-source mixture-of experts language model, can run completely on the device. This allows you to test our model in a private and secure environment. Our app is designed to help researchers explore ways to improve on-device intelligence and to allow developers to quickly prototype AI experiences. All without cloud connectivity. OLMoE is the highly efficient mix-of-experts model of the Ai2 OLMo models. Discover what real-world tasks are possible with state-of-the art local models. Learn how to improve AI models for small systems. You can test your own models using our open-source codebase. Integrate OLMoE with other iOS applications. The Ai2 OLMoE application provides privacy and security because it operates entirely on the device. Share the output of your conversation with friends and colleagues. The OLMoE application code and model are both open source. -
19
Falcon-40B
Technology Innovation Institute (TII)
FreeFalcon-40B is a 40B parameter causal decoder model, built by TII. It was trained on 1,000B tokens from RefinedWeb enhanced by curated corpora. It is available under the Apache 2.0 licence. Why use Falcon-40B Falcon-40B is the best open source model available. Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. OpenLLM Leaderboard. It has an architecture optimized for inference with FlashAttention, multiquery and multiquery. It is available under an Apache 2.0 license that allows commercial use without any restrictions or royalties. This is a raw model that should be finetuned to fit most uses. If you're looking for a model that can take generic instructions in chat format, we suggest Falcon-40B Instruct. -
20
AI21 Studio
AI21 Studio
$29 per monthAI21 Studio provides API access to Jurassic-1 large-language-models. Our models are used to generate text and provide comprehension features in thousands upon thousands of applications. You can tackle any language task. Our Jurassic-1 models can follow natural language instructions and only need a few examples to adapt for new tasks. Our APIs are perfect for common tasks such as paraphrasing, summarization, and more. Superior results at a lower price without having to reinvent the wheel Do you need to fine-tune your custom model? Just 3 clicks away. Training is quick, affordable, and models can be deployed immediately. Embed an AI co-writer into your app to give your users superpowers. Features like paraphrasing, long-form draft generation, repurposing, and custom auto-complete can increase user engagement and help you to achieve success. -
21
Azure OpenAI Service
Microsoft
$0.0004 per 1000 tokensYou can use advanced language models and coding to solve a variety of problems. To build cutting-edge applications, leverage large-scale, generative AI models that have deep understandings of code and language to allow for new reasoning and comprehension. These coding and language models can be applied to a variety use cases, including writing assistance, code generation, reasoning over data, and code generation. Access enterprise-grade Azure security and detect and mitigate harmful use. Access generative models that have been pretrained with trillions upon trillions of words. You can use them to create new scenarios, including code, reasoning, inferencing and comprehension. A simple REST API allows you to customize generative models with labeled information for your particular scenario. To improve the accuracy of your outputs, fine-tune the hyperparameters of your model. You can use the API's few-shot learning capability for more relevant results and to provide examples. -
22
ChatGPT is an OpenAI language model. It can generate human-like responses to a variety prompts, and has been trained on a wide range of internet texts. ChatGPT can be used to perform natural language processing tasks such as conversation, question answering, and text generation. ChatGPT is a pretrained language model that uses deep-learning algorithms to generate text. It was trained using large amounts of text data. This allows it to respond to a wide variety of prompts with human-like ease. It has a transformer architecture that has been proven to be efficient in many NLP tasks. ChatGPT can generate text in addition to answering questions, text classification and language translation. This allows developers to create powerful NLP applications that can do specific tasks more accurately. ChatGPT can also process code and generate it.
-
23
DeepSeek-V2
DeepSeek
FreeDeepSeek-V2, developed by DeepSeek-AI, is a cutting-edge Mixture-of-Experts (MoE) language model designed for cost-effective training and high-speed inference. Boasting a massive 236 billion parameters—though only 21 billion are active per token—it efficiently handles a context length of up to 128K tokens. The model leverages advanced architectural innovations such as Multi-head Latent Attention (MLA) to optimize inference by compressing the Key-Value (KV) cache and DeepSeekMoE to enable economical training via sparse computation. Compared to its predecessor, DeepSeek 67B, it slashes training costs by 42.5%, shrinks the KV cache by 93.3%, and boosts generation throughput by 5.76 times. Trained on a vast 8.1 trillion token dataset, DeepSeek-V2 excels in natural language understanding, programming, and complex reasoning, positioning itself as a premier choice in the open-source AI landscape. -
24
PaLM 2
Google
PaLM 2 is Google's next-generation large language model, which builds on Google’s research and development in machine learning. It excels in advanced reasoning tasks including code and mathematics, classification and question-answering, translation and multilingual competency, and natural-language generation better than previous state-of the-art LLMs including PaLM. It is able to accomplish these tasks due to the way it has been built - combining compute-optimal scale, an improved dataset mix, and model architecture improvement. PaLM 2 is based on Google's approach for building and deploying AI responsibly. It was rigorously evaluated for its potential biases and harms, as well as its capabilities and downstream applications in research and product applications. It is being used to power generative AI tools and features at Google like Bard, the PaLM API, and other state-ofthe-art models like Sec-PaLM and Med-PaLM 2. -
25
Jurassic-2
AI21
$29 per monthJurassic-2 is the latest generation AI21 Studio foundation models. It's a game changer in the field AI, with new capabilities and top-tier quality. We're also releasing task-specific APIs with superior reading and writing capabilities. AI21 Studio's focus is to help businesses and developers leverage reading and writing AI in order to build real-world, tangible products. The release of Task-Specific and Jurassic-2 APIs marks two significant milestones. They will enable you to bring generative AI into production. Jurassic-2 (or J2, as we like to call it) is the next generation of our foundation models with significant improvements in quality and new capabilities including zero-shot instruction-following, reduced latency, and multi-language support. Task-specific APIs offer developers industry-leading APIs for performing specialized reading and/or writing tasks. -
26
Qwen2.5-1M
Alibaba
FreeQwen2.5-1M is an advanced open-source language model developed by the Qwen team, capable of handling up to one million tokens in context. This release introduces two upgraded variants, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, marking a significant expansion in Qwen's capabilities. To enhance efficiency, the team has also released an optimized inference framework built on vLLM, incorporating sparse attention techniques that accelerate processing speeds by 3x to 7x for long-context inputs. The update enables more efficient handling of extensive text sequences, making it ideal for complex tasks requiring deep contextual understanding. Additional insights into the model’s architecture and performance improvements are detailed in the accompanying technical report. -
27
Samsung Gauss
Samsung
Samsung Gauss, a new AI-model developed by Samsung Electronics, is a powerful AI tool. It is a large-language model (LLM) which has been trained using a massive dataset. Samsung Gauss can generate text, translate different languages, create creative content and answer questions in a helpful way. Samsung Gauss, which is still in development, has already mastered many tasks, including Follow instructions and complete requests with care. Answering questions in an informative and comprehensive way, even when they are open-ended, challenging or strange. Creating different creative text formats such as poems, code, musical pieces, emails, letters, etc. Here are some examples to show what Samsung Gauss is capable of: Translation: Samsung Gauss is able to translate text between many languages, including English and German, as well as Spanish, Chinese, Japanese and Korean. Coding: Samsung Gauss can generate code. -
28
Jurassic-1
AI21 Labs
Jurassic-1 comes in two sizes. The Jumbo version is the most advanced language model, with 178B parameters. It was released to developers for general use. AI21 Studio, currently in open beta allows anyone to sign up for the service and immediately begin querying Jurassic-1 with our API and interactive website environment. AI21 Labs' mission is to fundamentally change the way humans read and compose by introducing machines as partners in thought. We can only achieve this if we work together. Since the Mesozoic Era, or 2017, we have been researching language models. Jurassic-1 is based on this research and is the first generation we are making available to wide use. -
29
Qwen-7B
Alibaba
FreeQwen-7B, also known as Qwen-7B, is the 7B-parameter variant of the large language models series Qwen. Tongyi Qianwen, proposed by Alibaba Cloud. Qwen-7B, a Transformer-based language model, is pretrained using a large volume data, such as web texts, books, code, etc. Qwen-7B is also used to train Qwen-7B Chat, an AI assistant that uses large models and alignment techniques. The Qwen-7B features include: Pre-trained with high quality data. We have pretrained Qwen-7B using a large-scale, high-quality dataset that we constructed ourselves. The dataset contains over 2.2 trillion tokens. The dataset contains plain texts and codes and covers a wide range domains including general domain data as well as professional domain data. Strong performance. We outperform our competitors in a series benchmark datasets that evaluate natural language understanding, mathematics and coding. And more. -
30
ERNIE 3.0 Titan
Baidu
Pre-trained models of language have achieved state-of the-art results for various Natural Language Processing (NLP). GPT-3 has demonstrated that scaling up language models pre-trained can further exploit their immense potential. Recently, a framework named ERNIE 3.0 for pre-training large knowledge enhanced models was proposed. This framework trained a model that had 10 billion parameters. ERNIE 3.0 performed better than the current state-of-the art models on a variety of NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. We also design a self supervised adversarial and a controllable model language loss to make ERNIE Titan generate credible texts. -
31
Yi-Large
01.AI
$0.19 per 1M input tokenYi-Large, a proprietary large language engine developed by 01.AI with a 32k context size and input and output costs of $2 per million tokens. It is distinguished by its advanced capabilities in common-sense reasoning and multilingual support. It performs on par with leading models such as GPT-4 and Claude3 when it comes to various benchmarks. Yi-Large was designed to perform tasks that require complex inference, language understanding, and prediction. It is suitable for applications such as knowledge search, data classifying, and creating chatbots. Its architecture is built on a decoder only transformer with enhancements like pre-normalization, Group Query attention, and has been trained using a large, high-quality, multilingual dataset. The model's versatility, cost-efficiency and global deployment potential make it a strong competitor in the AI market. -
32
Baichuan-13B
Baichuan Intelligent Technology
FreeBaichuan-13B, a large-scale language model with 13 billion parameters that is open source and available commercially by Baichuan Intelligent, was developed following Baichuan -7B. It has the best results for a language model of the same size in authoritative Chinese and English benchmarks. This release includes two versions of pretraining (Baichuan-13B Base) and alignment (Baichuan-13B Chat). Baichuan-13B has more data and a larger size. It expands the number parameters to 13 billion based on Baichuan -7B, and trains 1.4 trillion coins on high-quality corpus. This is 40% more than LLaMA-13B. It is open source and currently the model with the most training data in 13B size. Support Chinese and English bi-lingual, use ALiBi code, context window is 4096. -
33
Qwen2
Alibaba
FreeQwen2 is a large language model developed by Qwen Team, Alibaba Cloud. Qwen2 is an extensive series of large language model developed by the Qwen Team at Alibaba Cloud. It includes both base models and instruction-tuned versions, with parameters ranging from 0.5 to 72 billion. It also features dense models and a Mixture of Experts model. The Qwen2 Series is designed to surpass previous open-weight models including its predecessor Qwen1.5 and to compete with proprietary model across a wide spectrum of benchmarks, such as language understanding, generation and multilingual capabilities. -
34
Qwen2.5-Max
Alibaba
FreeQwen2.5-Max is an advanced Mixture-of-Experts (MoE) model from the Qwen team, trained on more than 20 trillion tokens and enhanced through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). It surpasses models like DeepSeek V3 in key benchmarks, including Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also performing strongly in broader evaluations like MMLU-Pro. Available via API on Alibaba Cloud, Qwen2.5-Max can also be tested interactively through Qwen Chat, offering users a powerful tool for diverse AI-driven applications. -
35
Falcon 2
Technology Innovation Institute (TII)
FreeFalcon 2 11B is a cutting-edge open-source AI model, designed for multilingual and multimodal tasks, and the only one featuring vision-to-language capabilities. It outperforms Meta’s Llama 3 8B and rivals Google’s Gemma 7B, as verified by the Hugging Face Leaderboard. The next step in its evolution includes integrating a 'Mixture of Experts' framework to further elevate its performance and expand its capabilities. -
36
MPT-7B
MosaicML
FreeIntroducing MPT-7B - the latest addition to our MosaicML Foundation Series. MPT-7B, a transformer that is trained from scratch using 1T tokens of code and text, is the latest entry in our MosaicML Foundation Series. It is open-source, available for commercial purposes, and has the same quality as LLaMA-7B. MPT-7B trained on the MosaicML Platform in 9.5 days, with zero human interaction at a cost $200k. You can now train, fine-tune and deploy your private MPT models. You can either start from one of our checkpoints, or you can start from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens! -
37
Aya
Cohere AI
Aya is an open-source, state-of-the art, massively multilingual large language research model (LLM), which covers 101 different languages. This is more than twice the number of languages that are covered by open-source models. Aya helps researchers unlock LLMs' powerful potential for dozens of cultures and languages that are largely ignored by the most advanced models available today. We open-source both the Aya Model, as well as the most comprehensive multilingual instruction dataset with 513 million words covering 114 different languages. This data collection contains rare annotations by native and fluent speakers from around the world. This ensures that AI technology is able to effectively serve a global audience who have had limited access up until now. -
38
Reka
Reka
Our enterprise-grade multimodal Assistant is designed with privacy, efficiency, and security in mind. Yasa is trained to read text, images and videos. Tabular data will be added in the future. Use it to generate creative tasks, find answers to basic questions or gain insights from your data. With a few simple commands, you can generate, train, compress or deploy your model on-premise. Our proprietary algorithms can be used to customize our model for your data and use case. We use proprietary algorithms for retrieval, fine tuning, self-supervised instructions tuning, and reinforcement to tune our model using your datasets. -
39
Hermes 3
Nous Research
FreeHermes 3 contains advanced long-term context retention and multi-turn conversation capabilities, complex roleplaying and internal monologue abilities, and enhanced agentic function-calling. Hermes 3 has advanced long-term contextual retention, multi-turn conversation capabilities, complex roleplaying, internal monologue, and enhanced agentic functions-calling. Our training data encourages the model in a very aggressive way to follow the system prompts and instructions exactly and in a highly adaptive manner. Hermes 3 was developed by fine-tuning Llama 3.0 8B, 70B and 405B and training with a dataset primarily containing synthetic responses. The model has a performance that is comparable to Llama 3.1, but with deeper reasoning and creative abilities. Hermes 3 is an instruct and tool-use model series with strong reasoning and creativity abilities. -
40
VideoPoet
Google
VideoPoet, a simple modeling technique, can convert any large language model or autoregressive model into a high quality video generator. It is composed of a few components. The autoregressive model learns from video, image, text, and audio modalities in order to predict the next audio or video token in the sequence. The LLM training framework introduces a mixture of multimodal generative objectives, including text to video, text to image, image-to video, video frame continuation and inpainting/outpainting, styled video, and video-to audio. Moreover, these tasks can be combined to provide additional zero-shot capabilities. This simple recipe shows how language models can edit and synthesize videos with a high level of temporal consistency. -
41
Llama 3.1
Meta
FreeOpen source AI model that you can fine-tune and distill anywhere. Our latest instruction-tuned models are available in 8B 70B and 405B version. Our open ecosystem allows you to build faster using a variety of product offerings that are differentiated and support your use cases. Choose between real-time or batch inference. Download model weights for further cost-per-token optimization. Adapt to your application, improve using synthetic data, and deploy on-prem. Use Llama components and extend the Llama model using RAG and zero shot tools to build agentic behavior. Use 405B high-quality data to improve specialized model for specific use cases. -
42
DeepSeek-V3
DeepSeek
Free 1 RatingDeepSeek-V3 is an advanced AI model built to excel in natural language comprehension, sophisticated reasoning, and decision-making across a wide range of applications. Harnessing innovative neural architectures and vast datasets, it offers exceptional capabilities for addressing complex challenges in fields like research, development, business analytics, and automation. Designed for both scalability and efficiency, DeepSeek-V3 empowers developers and organizations to drive innovation and unlock new possibilities with state-of-the-art AI solutions. -
43
Sky-T1
NovaSky
FreeSky-T1-32B is an open-source reasoning tool developed by the NovaSky group at UC Berkeley’s Sky Computing Lab. It is comparable to proprietary models such as o1 preview on reasoning and coding tests, but was trained for less than $450. This shows the feasibility of cost-effective high-level reasoning abilities. The model was fine-tuned using Qwen2.5 32B-Instruct and a curated dataset with 17,000 examples from diverse domains including math and coding. The training took 19 hours using eight H100 GPUs and DeepSpeed Zero-3 offloading. All aspects of the project are open-source including the data, code and model weights. This allows the academic and open source communities to duplicate and enhance the performance. -
44
Octave TTS
Hume AI
$3 per monthHume AI introduced Octave, a text-to-speech engine that uses large language models to understand and interpret context. Unlike traditional TTS systems that merely read texts, Octave delivers lines with nuanced emotion based on content. Users can create different AI voices using descriptive prompts such as "a medieval peasant who is sarcastic." This allows for customized voice generation that aligns to specific character traits or situations. Octave also allows users to customize the voice's emotional delivery and style by using natural language commands. For example, "sound more enthusiastic", "whisper fearfully", or "sound more excited" can be used to fine-tune output. -
45
Chinchilla
Google DeepMind
Chinchilla has a large language. Chinchilla has the same compute budget of Gopher, but 70B more parameters and 4x as much data. Chinchilla consistently and significantly outperforms Gopher 280B, GPT-3 175B, Jurassic-1 178B, and Megatron-Turing (530B) in a wide range of downstream evaluation tasks. Chinchilla also uses less compute to perform fine-tuning, inference and other tasks. This makes it easier for downstream users to use. Chinchilla reaches a high-level average accuracy of 67.5% for the MMLU benchmark. This is a greater than 7% improvement compared to Gopher. -
46
DeepSeek R2
DeepSeek
FreeDeepSeek R2 is poised to succeed DeepSeek R1, the revolutionary AI reasoning model introduced in January 2025 by the Chinese AI startup DeepSeek. R1 made waves in the industry with its cost-efficient performance, competing with top models like OpenAI’s o1, and R2 is expected to push the boundaries even further. Designed for superior speed and human-like reasoning, it aims to excel in complex domains such as advanced programming and intricate mathematical problem-solving. By harnessing DeepSeek’s cutting-edge Mixture-of-Experts framework and optimized training strategies, R2 is set to surpass its predecessor while maintaining efficiency. Additionally, it may extend its capabilities beyond English, broadening its reach. -
47
OpenEuroLLM
OpenEuroLLM
OpenEuroLLM is an initiative that brings together Europe's top AI companies and research institutes to create a series open-source foundation models in Europe for transparent AI. The project focuses on transparency by sharing data, documentation and training, testing, and evaluation metrics. This encourages community involvement. It ensures compliance to EU regulations and aims to provide large language models that are aligned with European standards. The focus is on linguistic diversity and cultural diversity. Multilingual capabilities are extended to include all EU official language and beyond. The initiative aims to improve access to foundational models that can be fine-tuned for various applications, expand the evaluation results in multiple language, and increase availability of training datasets. Transparency throughout the training process is maintained by sharing tools and methodologies, as well as intermediate results. -
48
Palmyra LLM
Writer
$18 per monthPalmyra is an enterprise-ready suite of Large Language Models. These models are excellent at tasks like image analysis, question answering, and supporting over 30 languages. They can be fine-tuned for industries such as healthcare and finance. Palmyra models are notable for their top rankings in benchmarks such as Stanford HELM and PubMedQA. Palmyra Fin is the first model that passed the CFA Level III examination. Writer protects client data by not using it to train or modify models. They have a zero-data retention policy. Palmyra includes specialized models, such as Palmyra X 004, which has tool-calling abilities; Palmyra Med for healthcare; Palmyra Fin for finance; and Palmyra Vision for advanced image and video processing. These models are available via Writer's full stack generative AI platform which integrates graph based Retrieval augmented Generation (RAG). -
49
Defense Llama
Scale AI
Scale AI is pleased to announce Defense Llama. This Large Language Model (LLM), built on Meta's Llama 3, is customized and fine-tuned for support of American national security missions. Defense Llama is available only in controlled U.S. Government environments within Scale Donovan. It empowers our servicemen and national security professionals by enabling them to apply the power generative AI for their unique use cases such as planning military operations or intelligence operations, and understanding adversary weaknesses. Defense Llama has been trained using a vast dataset that includes military doctrine, international human rights law, and relevant policy designed to align with Department of Defense (DoD), guidelines for armed conflicts, as well as DoD's Ethical Principles of Artificial Intelligence. This allows the model to respond with accurate, meaningful and relevant responses. Scale is proud that it can help U.S. national-security personnel use generative AI for defense in a safe and secure manner. -
50
ChatGLM
Zhipu AI
FreeChatGLM-6B, a Chinese-English bilingual dialogue model based on General Language Model architecture (GLM), has 6.2 billion parameters. Users can deploy model quantization locally on consumer-grade graphic cards (only 6GB video memory required at INT4 quantization levels). ChatGLM-6B is based on technology similar to ChatGPT and optimized for Chinese dialogue and Q&A. After approximately 1T identifiers for Chinese and English bilingual training and supplemented with supervision and fine-tuning as well as feedback self-help and human feedback reinforcement learning, ChatGLM-6B, with 6.2 billion parameters, has been able generate answers that are in line with human preference.