Best OPT Alternatives in 2025
Find the top alternatives to OPT currently available. Compare ratings, reviews, pricing, and features of OPT alternatives in 2025. Slashdot lists the best OPT alternatives on the market that offer competing products that are similar to OPT. Sort through OPT alternatives below to make the best choice for your needs
-
1
T5
Google
With T5, we propose re-framing all NLP into a unified format where the input and the output are always text strings. This is in contrast to BERT models which can only output a class label, or a span from the input. Our text-totext framework allows us use the same model and loss function on any NLP task. This includes machine translation, document summary, question answering and classification tasks. We can also apply T5 to regression by training it to predict a string representation of a numeric value instead of the actual number. -
2
Galactica
Meta
Information overload is a major barrier to scientific progress. The explosion of scientific literature and data makes it harder to find useful insights among a vast amount of information. Search engines are used to access scientific knowledge today, but they cannot organize it. Galactica is an extensive language model which can store, combine, and reason about scientific information. We train using a large corpus of scientific papers, reference material and knowledge bases, among other sources. We outperform other models in a variety of scientific tasks. Galactica performs better than the latest GPT-3 on technical knowledge probes like LaTeX Equations by 68.2% to 49.0%. Galactica is also good at reasoning. It outperforms Chinchilla in mathematical MMLU with a score between 41.3% and 35.7%. And PaLM 540B in MATH, with a score between 20.4% and 8.8%. -
3
Falcon-40B
Technology Innovation Institute (TII)
FreeFalcon-40B is a 40B parameter causal decoder model, built by TII. It was trained on 1,000B tokens from RefinedWeb enhanced by curated corpora. It is available under the Apache 2.0 licence. Why use Falcon-40B Falcon-40B is the best open source model available. Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. OpenLLM Leaderboard. It has an architecture optimized for inference with FlashAttention, multiquery and multiquery. It is available under an Apache 2.0 license that allows commercial use without any restrictions or royalties. This is a raw model that should be finetuned to fit most uses. If you're looking for a model that can take generic instructions in chat format, we suggest Falcon-40B Instruct. -
4
CodeQwen
QwenLM
FreeCodeQwen, developed by the Qwen Team, Alibaba Cloud, is the code version. It is a transformer based decoder only language model that has been pre-trained with a large number of codes. A series of benchmarks shows that the code generation is strong and that it performs well. Supporting long context generation and understanding with a context length of 64K tokens. CodeQwen is a 92-language coding language that provides excellent performance for text-to SQL, bug fixes, and more. CodeQwen chat is as simple as writing a few lines of code using transformers. We build the tokenizer and model using pre-trained methods and use the generate method for chatting. The chat template is provided by the tokenizer. Following our previous practice, we apply the ChatML Template for chat models. The model will complete the code snippets in accordance with the prompts without any additional formatting. -
5
Llama 2
Meta
FreeThe next generation of the large language model. This release includes modelweights and starting code to pretrained and fine tuned Llama languages models, ranging from 7B-70B parameters. Llama 1 models have a context length of 2 trillion tokens. Llama 2 models have a context length double that of Llama 1. The fine-tuned Llama 2 models have been trained using over 1,000,000 human annotations. Llama 2, a new open-source language model, outperforms many other open-source language models in external benchmarks. These include tests of reasoning, coding and proficiency, as well as knowledge tests. Llama 2 has been pre-trained using publicly available online data sources. Llama-2 chat, a fine-tuned version of the model, is based on publicly available instruction datasets, and more than 1 million human annotations. We have a wide range of supporters in the world who are committed to our open approach for today's AI. These companies have provided early feedback and have expressed excitement to build with Llama 2 -
6
PanGu-Σ
Huawei
The expansion of large language model has led to significant advancements in natural language processing, understanding and generation. This study introduces a new system that uses Ascend 910 AI processing units and the MindSpore framework in order to train a language with over one trillion parameters, 1.085T specifically, called PanGu-Sigma. This model, which builds on the foundation laid down by PanGu-alpha transforms the traditional dense Transformer model into a sparse model using a concept called Random Routed Experts. The model was trained efficiently on a dataset consisting of 329 billion tokens, using a technique known as Expert Computation and Storage Separation. This led to a 6.3 fold increase in training performance via heterogeneous computer. The experiments show that PanGu-Sigma is a new standard for zero-shot learning in various downstream Chinese NLP tasks. -
7
GPT-4 (Generative Pretrained Transformer 4) a large-scale, unsupervised language model that is yet to be released. GPT-4, which is the successor of GPT-3, is part of the GPT -n series of natural-language processing models. It was trained using a dataset of 45TB text to produce text generation and understanding abilities that are human-like. GPT-4 is not dependent on additional training data, unlike other NLP models. It can generate text and answer questions using its own context. GPT-4 has been demonstrated to be capable of performing a wide range of tasks without any task-specific training data, such as translation, summarization and sentiment analysis.
-
8
Qwen-7B
Alibaba
FreeQwen-7B, also known as Qwen-7B, is the 7B-parameter variant of the large language models series Qwen. Tongyi Qianwen, proposed by Alibaba Cloud. Qwen-7B, a Transformer-based language model, is pretrained using a large volume data, such as web texts, books, code, etc. Qwen-7B is also used to train Qwen-7B Chat, an AI assistant that uses large models and alignment techniques. The Qwen-7B features include: Pre-trained with high quality data. We have pretrained Qwen-7B using a large-scale, high-quality dataset that we constructed ourselves. The dataset contains over 2.2 trillion tokens. The dataset contains plain texts and codes and covers a wide range domains including general domain data as well as professional domain data. Strong performance. We outperform our competitors in a series benchmark datasets that evaluate natural language understanding, mathematics and coding. And more. -
9
Falcon-7B
Technology Innovation Institute (TII)
FreeFalcon-7B is a 7B parameter causal decoder model, built by TII. It was trained on 1,500B tokens from RefinedWeb enhanced by curated corpora. It is available under the Apache 2.0 licence. Why use Falcon-7B Falcon-7B? It outperforms similar open-source models, such as MPT-7B StableLM RedPajama, etc. It is a result of being trained using 1,500B tokens from RefinedWeb enhanced by curated corpora. OpenLLM Leaderboard. It has an architecture optimized for inference with FlashAttention, multiquery and multiquery. It is available under an Apache 2.0 license that allows commercial use without any restrictions or royalties. -
10
PanGu-α
Huawei
PanGu-a was developed under MindSpore, and trained on 2048 Ascend AI processors. The MindSpore Auto-parallel parallelism strategy was implemented to scale the training task efficiently to 2048 processors. This includes data parallelism as well as op-level parallelism. We pretrain PanGu-a with 1.1TB of high-quality Chinese data collected from a variety of domains in order to enhance its generalization ability. We test the generation abilities of PanGua in different scenarios, including text summarizations, question answering, dialog generation, etc. We also investigate the effects of model scaling on the few shot performances across a wide range of Chinese NLP task. The experimental results show that PanGu-a is superior in performing different tasks with zero-shot or few-shot settings. -
11
TinyLlama
TinyLlama
FreeThe TinyLlama Project aims to pretrain an 1.1B Llama on 3 trillion tokens. We can achieve this in "just" 90 day using 16 A100-40G graphics cards with some optimization. We used the exact same architecture and tokenizers as Llama 2 TinyLlama is compatible with many open-source Llama projects. TinyLlama has only 1.1B of parameters. This compactness allows TinyLlama to be used for a variety of applications that require a small computation and memory footprint. -
12
Baichuan-13B
Baichuan Intelligent Technology
FreeBaichuan-13B, a large-scale language model with 13 billion parameters that is open source and available commercially by Baichuan Intelligent, was developed following Baichuan -7B. It has the best results for a language model of the same size in authoritative Chinese and English benchmarks. This release includes two versions of pretraining (Baichuan-13B Base) and alignment (Baichuan-13B Chat). Baichuan-13B has more data and a larger size. It expands the number parameters to 13 billion based on Baichuan -7B, and trains 1.4 trillion coins on high-quality corpus. This is 40% more than LLaMA-13B. It is open source and currently the model with the most training data in 13B size. Support Chinese and English bi-lingual, use ALiBi code, context window is 4096. -
13
Cerebras-GPT
Cerebras
FreeThe training of state-of-the art language models is extremely difficult. They require large compute budgets, complex distributed computing techniques and deep ML knowledge. Few organizations are able to train large language models from scratch. The number of organizations that do not open source their results is increasing, even though they have the expertise and resources to do so. We at Cerebras believe in open access to the latest models. Cerebras is proud to announce that Cerebras GPT, a family GPT models with 111 million to thirteen billion parameters, has been released to the open-source community. These models are trained using the Chinchilla Formula and provide the highest accuracy within a given computing budget. Cerebras GPT has faster training times and lower training costs. It also consumes less power than any other publicly available model. -
14
ChatGPT is an OpenAI language model. It can generate human-like responses to a variety prompts, and has been trained on a wide range of internet texts. ChatGPT can be used to perform natural language processing tasks such as conversation, question answering, and text generation. ChatGPT is a pretrained language model that uses deep-learning algorithms to generate text. It was trained using large amounts of text data. This allows it to respond to a wide variety of prompts with human-like ease. It has a transformer architecture that has been proven to be efficient in many NLP tasks. ChatGPT can generate text in addition to answering questions, text classification and language translation. This allows developers to create powerful NLP applications that can do specific tasks more accurately. ChatGPT can also process code and generate it.
-
15
Llama
Meta
Llama (Large Language Model meta AI) is a state of the art foundational large language model that was created to aid researchers in this subfield. Llama allows researchers to use smaller, more efficient models to study these models. This further democratizes access to this rapidly-changing field. Because it takes far less computing power and resources than large language models, such as Llama, to test new approaches, validate other's work, and explore new uses, training smaller foundation models like Llama can be a desirable option. Foundation models are trained on large amounts of unlabeled data. This makes them perfect for fine-tuning for many tasks. We make Llama available in several sizes (7B-13B, 33B and 65B parameters), and also share a Llama card that explains how the model was built in line with our Responsible AI practices. -
16
ERNIE 3.0 Titan
Baidu
Pre-trained models of language have achieved state-of the-art results for various Natural Language Processing (NLP). GPT-3 has demonstrated that scaling up language models pre-trained can further exploit their immense potential. Recently, a framework named ERNIE 3.0 for pre-training large knowledge enhanced models was proposed. This framework trained a model that had 10 billion parameters. ERNIE 3.0 performed better than the current state-of-the art models on a variety of NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. We also design a self supervised adversarial and a controllable model language loss to make ERNIE Titan generate credible texts. -
17
Megatron-Turing
NVIDIA
Megatron-Turing Natural Language Generation Model (MT-NLG) is the largest and most powerful monolithic English language model. It has 530 billion parameters. This 105-layer transformer-based MTNLG improves on the previous state-of-the art models in zero, one, and few shot settings. It is unmatched in its accuracy across a wide range of natural language tasks, including Completion prediction and Reading comprehension. NVIDIA has announced an Early Access Program for its managed API service in MT-NLG Mode. This program will allow customers to experiment with, employ and apply a large language models on downstream language tasks. -
18
ALBERT
Google
ALBERT is a Transformer model that can be self-supervised and was trained on large amounts of English data. It does not need manual labelling and instead uses an automated process that generates inputs and labels from the raw text. It is trained with two distinct goals in mind. Masked Language Modeling is the first. This randomly masks 15% words in an input sentence and requires that the model predict them. This technique is different from autoregressive models such as GPT and RNNs in that it allows the model learn bidirectional sentence representations. Sentence Ordering Prediction is the second objective. This involves predicting the order of two consecutive text segments during pretraining. -
19
Hippocratic AI
Hippocratic AI
Hippocratic AI, the new SOTA model, is outperforming GPT-4 in 105 of 114 healthcare certifications and exams. Hippocratic AI outperformed GPT-4 in 105 of 114 tests, outperforming by a margin greater than five percent on 74 certifications and by a larger margin on 43 certifications. Most language models are pre-trained on the common crawling of the Internet. This may include incorrect or misleading information. Hippocratic AI, unlike these LLMs is heavily investing in legally acquiring evidenced-based healthcare content. We use healthcare professionals to train the model and validate its readiness for deployment. This is called RLHF-HP. Hippocratic AI won't release the model until many of these licensed professionals have deemed it safe. -
20
Granite Code
IBM
FreeWe introduce the Granite family of decoder only code models for code generation tasks (e.g. fixing bugs, explaining codes, documenting codes), trained with code in 116 programming language. The Granite Code family has been evaluated on a variety of tasks and demonstrates that the models are consistently at the top of their game among open source code LLMs. Granite Code models have a number of key advantages. Granite Code models are able to perform at a competitive level or even at the cutting edge of technology in a variety of code-related tasks including code generation, explanations, fixing, translation, editing, and more. Demonstrating the ability to solve a variety of coding tasks. IBM's Corporate Legal team guides all models for trustworthy enterprise use. All models are trained using license-permissible datasets collected according to IBM's AI Ethics Principles. -
21
GPT-5
OpenAI
$0.0200 per 1000 tokensGPT-5 is OpenAI's Generative Pretrained Transformer. It is a large-language model (LLM), which is still in development. LLMs have been trained to work with massive amounts of text and can generate realistic and coherent texts, translate languages, create different types of creative content and answer your question in a way that is informative. It's still not available to the public. OpenAI has not announced a release schedule, but some believe it could launch in 2024. It's expected that GPT-5 will be even more powerful. GPT-4 has already proven to be impressive. It is capable of writing creative content, translating languages and generating text of human-quality. GPT-5 will be expected to improve these abilities, with improved reasoning, factual accuracy and ability to follow directions. -
22
Llama 3.2
Meta
FreeThere are now more versions of the open-source AI model that you can refine, distill and deploy anywhere. Choose from 1B or 3B, or build with Llama 3. Llama 3.2 consists of a collection large language models (LLMs), which are pre-trained and fine-tuned. They come in sizes 1B and 3B, which are multilingual text only. Sizes 11B and 90B accept both text and images as inputs and produce text. Our latest release allows you to create highly efficient and performant applications. Use our 1B and 3B models to develop on-device applications, such as a summary of a conversation from your phone, or calling on-device features like calendar. Use our 11B and 90B models to transform an existing image or get more information from a picture of your surroundings. -
23
Sparrow
DeepMind
Sparrow is a research model that serves as a proof of concept. It was created with the goal to train dialogue agents to be more helpful and correct. Sparrow helps us understand how to train agents to be more helpful and safer, and ultimately to help create safer and more useful artificial intelligence (AGI). Sparrow is currently not available for public use. Because it is difficult to determine what makes a conversation successful, training conversational AI can be a challenging problem. We use reinforcement learning (RL) to address this problem. This is a form that uses people's feedback and the preference feedback of study participants to train a model about how useful an answer is. We show participants multiple models of the same question, and ask them which one they prefer. -
24
Sky-T1
NovaSky
FreeSky-T1-32B is an open-source reasoning tool developed by the NovaSky group at UC Berkeley’s Sky Computing Lab. It is comparable to proprietary models such as o1 preview on reasoning and coding tests, but was trained for less than $450. This shows the feasibility of cost-effective high-level reasoning abilities. The model was fine-tuned using Qwen2.5 32B-Instruct and a curated dataset with 17,000 examples from diverse domains including math and coding. The training took 19 hours using eight H100 GPUs and DeepSpeed Zero-3 offloading. All aspects of the project are open-source including the data, code and model weights. This allows the academic and open source communities to duplicate and enhance the performance. -
25
Stable LM
Stability AI
FreeStableLM: Stability AI language models StableLM builds upon our experience with open-sourcing previous language models in collaboration with EleutherAI. This nonprofit research hub. These models include GPTJ, GPTNeoX and the Pythia Suite, which were all trained on The Pile dataset. Cerebras GPT and Dolly-2 are two recent open-source models that continue to build upon these efforts. StableLM was trained on a new dataset that is three times bigger than The Pile and contains 1.5 trillion tokens. We will provide more details about the dataset at a later date. StableLM's richness allows it to perform well in conversational and coding challenges, despite the small size of its dataset (3-7 billion parameters, compared to GPT-3's 175 billion). The development of Stable LM 3B broadens the range of applications that are viable on the edge or on home PCs. This means that individuals and companies can now develop cutting-edge technologies with strong conversational capabilities – like creative writing assistance – while keeping costs low and performance high. -
26
Codestral Mamba
Mistral AI
Codestral Mamba is a Mamba2 model that specializes in code generation. It is available under the Apache 2.0 license. Codestral Mamba represents another step in our efforts to study and provide architectures. We hope that it will open up new perspectives in architecture research. Mamba models have the advantage of linear inference of time and the theoretical ability of modeling sequences of unlimited length. Users can interact with the model in a more extensive way with rapid responses, regardless of the input length. This efficiency is particularly relevant for code productivity use-cases. We trained this model with advanced reasoning and code capabilities, enabling the model to perform at par with SOTA Transformer-based models. -
27
MPT-7B
MosaicML
FreeIntroducing MPT-7B - the latest addition to our MosaicML Foundation Series. MPT-7B, a transformer that is trained from scratch using 1T tokens of code and text, is the latest entry in our MosaicML Foundation Series. It is open-source, available for commercial purposes, and has the same quality as LLaMA-7B. MPT-7B trained on the MosaicML Platform in 9.5 days, with zero human interaction at a cost $200k. You can now train, fine-tune and deploy your private MPT models. You can either start from one of our checkpoints, or you can start from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens! -
28
NVIDIA NeMo Megatron
NVIDIA
NVIDIA NeMo megatron is an end to-end framework that can be used to train and deploy LLMs with billions or trillions of parameters. NVIDIA NeMo Megatron is part of the NVIDIAAI platform and offers an efficient, cost-effective, and cost-effective containerized approach to building and deploying LLMs. It is designed for enterprise application development and builds upon the most advanced technologies of NVIDIA research. It provides an end-to–end workflow for automated distributed processing, training large-scale customized GPT-3 and T5 models, and deploying models to infer at scale. The validation of converged recipes that allow for training and inference is a key to unlocking the power and potential of LLMs. The hyperparameter tool makes it easy to customize models. It automatically searches for optimal hyperparameter configurations, performance, and training/inference for any given distributed GPU cluster configuration. -
29
InstructGPT
OpenAI
$0.0200 per 1000 tokensInstructGPT is an open source framework that trains language models to generate natural language instruction from visual input. It uses a generative, pre-trained transformer model (GPT) and the state of the art object detector Mask R-CNN to detect objects in images. Natural language sentences are then generated that describe the image. InstructGPT has been designed to be useful in all domains including robotics, gaming, and education. It can help robots navigate complex tasks using natural language instructions or it can help students learn by giving descriptive explanations of events or processes. -
30
Azure OpenAI Service
Microsoft
$0.0004 per 1000 tokensYou can use advanced language models and coding to solve a variety of problems. To build cutting-edge applications, leverage large-scale, generative AI models that have deep understandings of code and language to allow for new reasoning and comprehension. These coding and language models can be applied to a variety use cases, including writing assistance, code generation, reasoning over data, and code generation. Access enterprise-grade Azure security and detect and mitigate harmful use. Access generative models that have been pretrained with trillions upon trillions of words. You can use them to create new scenarios, including code, reasoning, inferencing and comprehension. A simple REST API allows you to customize generative models with labeled information for your particular scenario. To improve the accuracy of your outputs, fine-tune the hyperparameters of your model. You can use the API's few-shot learning capability for more relevant results and to provide examples. -
31
CodeGemma
Google
CodeGemma consists of powerful lightweight models that are capable of performing a variety coding tasks, including fill-in the middle code completion, code creation, natural language understanding and mathematical reasoning. CodeGemma offers 3 variants: a 7B model that is pre-trained to perform code completion, code generation, and natural language-to code chat. A 7B model that is instruction-tuned for instruction following and natural language-to code chat. You can complete lines, functions, or even entire blocks of code whether you are working locally or with Google Cloud resources. CodeGemma models are trained on 500 billion tokens primarily of English language data taken from web documents, mathematics and code. They generate code that is not only syntactically accurate but also semantically meaningful. This reduces errors and debugging times. -
32
OLMo 2
Ai2
OLMo 2 is an open language model family developed by the Allen Institute for AI. It provides researchers and developers with open-source code and reproducible training recipes. These models can be trained with up to 5 trillion tokens, and they are competitive against other open-weight models such as Llama 3.0 on English academic benchmarks. OLMo 2 focuses on training stability by implementing techniques that prevent loss spikes in long training runs. It also uses staged training interventions to address capability deficits during late pretraining. The models incorporate the latest post-training methods from AI2's Tulu 3 resulting in OLMo 2-Instruct. The Open Language Modeling Evaluation System, or OLMES, was created to guide improvements throughout the development stages. It consists of 20 evaluation benchmarks assessing key capabilities. -
33
mT5
Google
FreeMultilingual T5 is a massively pretrained text-totext transformer model that has been trained using a similar recipe to T5. This repo can used to reproduce the experiments described in the mT5 article. The mC4 corpus covers 101 languages. Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, and more. -
34
Mistral NeMo
Mistral AI
FreeMistral NeMo, our new best small model. A state-of the-art 12B with 128k context and released under Apache 2.0 license. Mistral NeMo, a 12B-model built in collaboration with NVIDIA, is available. Mistral NeMo has a large context of up to 128k Tokens. Its reasoning, world-knowledge, and coding precision are among the best in its size category. Mistral NeMo, which relies on a standard architecture, is easy to use. It can be used as a replacement for any system that uses Mistral 7B. We have released Apache 2.0 licensed pre-trained checkpoints and instruction-tuned base checkpoints to encourage adoption by researchers and enterprises. Mistral NeMo has been trained with quantization awareness to enable FP8 inferences without performance loss. The model was designed for global applications that are multilingual. It is trained in function calling, and has a large contextual window. It is better than Mistral 7B at following instructions, reasoning and handling multi-turn conversation. -
35
Teuken 7B
OpenGPT-X
FreeTeuken-7B, a multilingual open source language model, was developed under the OpenGPT-X project. It is specifically designed to accommodate Europe's diverse linguistic landscape. It was trained on a dataset that included over 50% non-English text, covering all 24 official European Union languages, to ensure robust performance. Teuken-7B's custom multilingual tokenizer is a key innovation. It has been optimized for European languages and enhances training efficiency. The model comes in two versions: Teuken-7B Base, a pre-trained foundational model, and Teuken-7B Instruct, a model that has been tuned to better follow user prompts. Hugging Face makes both versions available, promoting transparency and cooperation within the AI community. The development of Teuken-7B demonstrates a commitment to create AI models that reflect Europe’s diversity. -
36
RoBERTa
Meta
FreeRoBERTa is based on BERT's language-masking strategy. The system learns to predict hidden sections of text in unannotated language examples. RoBERTa was implemented in PyTorch and modifies key hyperparameters of BERT. This includes removing BERT’s next-sentence-pretraining objective and training with larger mini-batches. This allows RoBERTa improve on the masked-language modeling objective, which is comparable to BERT. It also leads to improved downstream task performance. We are also exploring the possibility of training RoBERTa with a lot more data than BERT and for a longer time. We used both existing unannotated NLP data sets as well as CC-News which was a new set of public news articles. -
37
Alpaca
Stanford Center for Research on Foundation Models (CRFM)
Instruction-following models such as GPT-3.5 (text-DaVinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. These models are now used by many users, and some even for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language. It is vital that the academic community engages in order to make maximum progress towards addressing these pressing issues. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI's text-DaVinci-003. We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta's LLaMA 7B model. -
38
Jamba
AI21 Labs
Jamba is a powerful and efficient long context model that is open to builders, but built for enterprises. Jamba's latency is superior to all other leading models of similar size. Jamba's 256k window is the longest available. Jamba's Mamba Transformer MoE Architecture is designed to increase efficiency and reduce costs. Jamba includes key features from OOTB, including function calls, JSON output, document objects and citation mode. Jamba 1.5 models deliver high performance throughout the entire context window. Jamba 1.5 models score highly in common quality benchmarks. Secure deployment tailored to your enterprise. Start using Jamba immediately on our production-grade SaaS Platform. Our strategic partners can deploy the Jamba model family. For enterprises who require custom solutions, we offer VPC and on-premise deployments. We offer hands-on management and continuous pre-training for enterprises with unique, bespoke needs. -
39
Phi-2
Microsoft
Phi-2 is a 2.7-billion-parameter language-model that shows outstanding reasoning and language-understanding capabilities. It represents the state-of-the art performance among language-base models with less than thirteen billion parameters. Phi-2 can match or even outperform models 25x larger on complex benchmarks, thanks to innovations in model scaling. Phi-2's compact size makes it an ideal playground for researchers. It can be used for exploring mechanistic interpretationability, safety improvements or fine-tuning experiments on a variety tasks. We have included Phi-2 in the Azure AI Studio catalog to encourage research and development of language models. -
40
Gemma
Google
Gemma is the family of lightweight open models that are built using the same research and technology as the Gemini models. Gemma was developed by Google DeepMind, along with other teams within Google. The name is derived from the Latin gemma meaning "precious stones". We're also releasing new tools to encourage developer innovation, encourage collaboration, and guide responsible use of Gemma model. Gemma models are based on the same infrastructure and technical components as Gemini, Google's largest and most powerful AI model. Gemma 2B, 7B and other open models can achieve the best performance possible for their size. Gemma models can run directly on a desktop or laptop computer for developers. Gemma is able to surpass much larger models in key benchmarks, while adhering our rigorous standards of safe and responsible outputs. -
41
LongLLaMA
LongLLaMA
FreeThis repository contains a research preview of LongLLaMA. It is a large language-model capable of handling contexts up to 256k tokens. LongLLaMA was built on the foundation of OpenLLaMA, and fine-tuned with the Focused Transformer method. LongLLaMA code was built on the foundation of Code Llama. We release a smaller base variant of the LongLLaMA (not instruction-tuned) on a permissive licence (Apache 2.0), and inference code that supports longer contexts for hugging face. Our model weights are a drop-in replacement for LLaMA (for short contexts up to 2048 tokens) in existing implementations. We also provide evaluation results, and comparisons with the original OpenLLaMA model. -
42
OpenELM
Apple
OpenELM is a family of open-source language models developed by Apple. It uses a layering strategy to allocate parameters efficiently within each layer of a transformer model. This leads to improved accuracy compared to other open language models. OpenELM was trained using publicly available datasets, and it achieves the best performance for its size. -
43
With just a few lines, you can integrate natural language understanding and generation into the product. The Cohere API allows you to access models that can read billions upon billions of pages and learn the meaning, sentiment, intent, and intent of every word we use. You can use the Cohere API for human-like text. Simply fill in a prompt or complete blanks. You can create code, write copy, summarize text, and much more. Calculate the likelihood of text, and retrieve representations from your model. You can filter text using the likelihood API based on selected criteria or categories. You can create your own downstream models for a variety of domain-specific natural languages tasks by using representations. The Cohere API is able to compute the similarity of pieces of text and make categorical predictions based on the likelihood of different text options. The model can see ideas through multiple lenses so it can identify abstract similarities between concepts as distinct from DNA and computers.
-
44
Jurassic-1
AI21 Labs
Jurassic-1 comes in two sizes. The Jumbo version is the most advanced language model, with 178B parameters. It was released to developers for general use. AI21 Studio, currently in open beta allows anyone to sign up for the service and immediately begin querying Jurassic-1 with our API and interactive website environment. AI21 Labs' mission is to fundamentally change the way humans read and compose by introducing machines as partners in thought. We can only achieve this if we work together. Since the Mesozoic Era, or 2017, we have been researching language models. Jurassic-1 is based on this research and is the first generation we are making available to wide use. -
45
PygmalionAI
PygmalionAI
FreePygmalionAI, a community of open-source projects based upon EleutherAI’s GPT-J 6B models and Meta’s LLaMA model, was founded in 2009. Pygmalion AI is designed for roleplaying and chatting. The 7B variant of the Pygmalion AI is currently actively supported. It is based on Meta AI’s LLaMA AI model. Pygmalion's chat capabilities are superior to larger language models that require much more resources. Our curated datasets of high-quality data on roleplaying ensure that your bot is the best RP partner. The model weights as well as the code used to train the model are both open-source. You can modify/re-distribute them for any purpose you like. Pygmalion and other language models run on GPUs because they require fast memory and massive processing to produce coherent text at a reasonable speed. -
46
ChatGLM
Zhipu AI
FreeChatGLM-6B, a Chinese-English bilingual dialogue model based on General Language Model architecture (GLM), has 6.2 billion parameters. Users can deploy model quantization locally on consumer-grade graphic cards (only 6GB video memory required at INT4 quantization levels). ChatGLM-6B is based on technology similar to ChatGPT and optimized for Chinese dialogue and Q&A. After approximately 1T identifiers for Chinese and English bilingual training and supplemented with supervision and fine-tuning as well as feedback self-help and human feedback reinforcement learning, ChatGLM-6B, with 6.2 billion parameters, has been able generate answers that are in line with human preference. -
47
Qwen
Alibaba
FreeQwen LLM is a family of large-language models (LLMs), developed by Damo Academy, an Alibaba Cloud subsidiary. These models are trained using a large dataset of text and codes, allowing them the ability to understand and generate text that is human-like, translate languages, create different types of creative content and answer your question in an informative manner. Here are some of the key features of Qwen LLMs. Variety of sizes: Qwen's series includes sizes ranging from 1.8 billion parameters to 72 billion, offering options that meet different needs and performance levels. Open source: Certain versions of Qwen have open-source code, which is available to anyone for use and modification. Qwen is multilingual and can translate multiple languages including English, Chinese and Japanese. Qwen models are capable of a wide range of tasks, including text summarization and code generation, as well as generation and translation. -
48
OpenScholar
Ai2
Ai2 OpenScholar, a collaboration between the University of Washington's Allen Institute for AI and the University of Washington, is designed to help scientists navigate and synthesize the vast expanse of the scientific literature. OpenScholar uses a retrieval-augmented model of language to answer user queries. It does this by identifying relevant papers and then generating answers based on those sources. This ensures that information is accurate and linked directly to existing research. OpenScholar-8B set new standards for factuality and accuracy of citations on the ScholarQABench benchmark. OpenScholar-8B, for example, maintains a solid grounding in real retrieved articles in the biomedical domain. This is in contrast to models like GPT-4 which tend to hallucinate references. Twenty scientists from computer science, biomedicine and physics evaluated OpenScholar's answers against expert-written responses to evaluate its real-world application. -
49
ESMFold
Meta
FreeESMFold demonstrates how AI can provide new tools for understanding the natural world. It is similar to the microscope which allowed us to see the world at a tiny scale and gave us a new understanding of the world. AI can help us see biology in a different way and understand the vastness of nature. AI research has largely focused on helping computers understand the world in a similar way to humans. The language of proteins is a language that is beyond human comprehension. Even the most powerful computational tools have failed to understand it. AI has the potential of opening up this language to our comprehension. AI can be studied in new domains like biology to gain a better understanding of artificial intelligence. Our research reveals connections across domains. Large language models that are behind machine translation, natural speech understanding, speech recognition, image generation, and machine translation are also able learn deep information about biology. -
50
RedPajama
RedPajama
FreeGPT-4 and other foundation models have accelerated AI's development. The most powerful models, however, are closed commercial models or partially open. RedPajama aims to create a set leading, open-source models. Today, we're excited to announce that the first phase of this project is complete: the reproduction of LLaMA's training dataset of more than 1.2 trillion tokens. The most capable foundations models are currently closed behind commercial APIs. This limits research, customization and their use with sensitive information. If the open community can bridge the quality gap between closed and open models, fully open-source models could be the answer to these limitations. Recent progress has been made in this area. AI is in many ways having its Linux moment. Stable Diffusion demonstrated that open-source software can not only compete with commercial offerings such as DALL-E, but also lead to incredible creative results from community participation.