Best OpenLLaMA Alternatives in 2024
Find the top alternatives to OpenLLaMA currently available. Compare ratings, reviews, pricing, and features of OpenLLaMA alternatives in 2024. Slashdot lists the best OpenLLaMA alternatives on the market that offer competing products that are similar to OpenLLaMA. Sort through OpenLLaMA alternatives below to make the best choice for your needs
-
1
RedPajama
RedPajama
FreeGPT-4 and other foundation models have accelerated AI's development. The most powerful models, however, are closed commercial models or partially open. RedPajama aims to create a set leading, open-source models. Today, we're excited to announce that the first phase of this project is complete: the reproduction of LLaMA's training dataset of more than 1.2 trillion tokens. The most capable foundations models are currently closed behind commercial APIs. This limits research, customization and their use with sensitive information. If the open community can bridge the quality gap between closed and open models, fully open-source models could be the answer to these limitations. Recent progress has been made in this area. AI is in many ways having its Linux moment. Stable Diffusion demonstrated that open-source software can not only compete with commercial offerings such as DALL-E, but also lead to incredible creative results from community participation. -
2
LongLLaMA
LongLLaMA
FreeThis repository contains a research preview of LongLLaMA. It is a large language-model capable of handling contexts up to 256k tokens. LongLLaMA was built on the foundation of OpenLLaMA, and fine-tuned with the Focused Transformer method. LongLLaMA code was built on the foundation of Code Llama. We release a smaller base variant of the LongLLaMA (not instruction-tuned) on a permissive licence (Apache 2.0), and inference code that supports longer contexts for hugging face. Our model weights are a drop-in replacement for LLaMA (for short contexts up to 2048 tokens) in existing implementations. We also provide evaluation results, and comparisons with the original OpenLLaMA model. -
3
PygmalionAI
PygmalionAI
FreePygmalionAI, a community of open-source projects based upon EleutherAI’s GPT-J 6B models and Meta’s LLaMA model, was founded in 2009. Pygmalion AI is designed for roleplaying and chatting. The 7B variant of the Pygmalion AI is currently actively supported. It is based on Meta AI’s LLaMA AI model. Pygmalion's chat capabilities are superior to larger language models that require much more resources. Our curated datasets of high-quality data on roleplaying ensure that your bot is the best RP partner. The model weights as well as the code used to train the model are both open-source. You can modify/re-distribute them for any purpose you like. Pygmalion and other language models run on GPUs because they require fast memory and massive processing to produce coherent text at a reasonable speed. -
4
Falcon-40B
Technology Innovation Institute (TII)
FreeFalcon-40B is a 40B parameter causal decoder model, built by TII. It was trained on 1,000B tokens from RefinedWeb enhanced by curated corpora. It is available under the Apache 2.0 licence. Why use Falcon-40B Falcon-40B is the best open source model available. Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. OpenLLM Leaderboard. It has an architecture optimized for inference with FlashAttention, multiquery and multiquery. It is available under an Apache 2.0 license that allows commercial use without any restrictions or royalties. This is a raw model that should be finetuned to fit most uses. If you're looking for a model that can take generic instructions in chat format, we suggest Falcon-40B Instruct. -
5
Llama 2
Meta
FreeThe next generation of the large language model. This release includes modelweights and starting code to pretrained and fine tuned Llama languages models, ranging from 7B-70B parameters. Llama 1 models have a context length of 2 trillion tokens. Llama 2 models have a context length double that of Llama 1. The fine-tuned Llama 2 models have been trained using over 1,000,000 human annotations. Llama 2, a new open-source language model, outperforms many other open-source language models in external benchmarks. These include tests of reasoning, coding and proficiency, as well as knowledge tests. Llama 2 has been pre-trained using publicly available online data sources. Llama-2 chat, a fine-tuned version of the model, is based on publicly available instruction datasets, and more than 1 million human annotations. We have a wide range of supporters in the world who are committed to our open approach for today's AI. These companies have provided early feedback and have expressed excitement to build with Llama 2 -
6
Falcon-7B
Technology Innovation Institute (TII)
FreeFalcon-7B is a 7B parameter causal decoder model, built by TII. It was trained on 1,500B tokens from RefinedWeb enhanced by curated corpora. It is available under the Apache 2.0 licence. Why use Falcon-7B Falcon-7B? It outperforms similar open-source models, such as MPT-7B StableLM RedPajama, etc. It is a result of being trained using 1,500B tokens from RefinedWeb enhanced by curated corpora. OpenLLM Leaderboard. It has an architecture optimized for inference with FlashAttention, multiquery and multiquery. It is available under an Apache 2.0 license that allows commercial use without any restrictions or royalties. -
7
Defense Llama
Scale AI
Scale AI is pleased to announce Defense Llama. This Large Language Model (LLM), built on Meta's Llama 3, is customized and fine-tuned for support of American national security missions. Defense Llama is available only in controlled U.S. Government environments within Scale Donovan. It empowers our servicemen and national security professionals by enabling them to apply the power generative AI for their unique use cases such as planning military operations or intelligence operations, and understanding adversary weaknesses. Defense Llama has been trained using a vast dataset that includes military doctrine, international human rights law, and relevant policy designed to align with Department of Defense (DoD), guidelines for armed conflicts, as well as DoD's Ethical Principles of Artificial Intelligence. This allows the model to respond with accurate, meaningful and relevant responses. Scale is proud that it can help U.S. national-security personnel use generative AI for defense in a safe and secure manner. -
8
OLMo 2
Ai2
OLMo 2 is an open language model family developed by the Allen Institute for AI. It provides researchers and developers with open-source code and reproducible training recipes. These models can be trained with up to 5 trillion tokens, and they are competitive against other open-weight models such as Llama 3.0 on English academic benchmarks. OLMo 2 focuses on training stability by implementing techniques that prevent loss spikes in long training runs. It also uses staged training interventions to address capability deficits during late pretraining. The models incorporate the latest post-training methods from AI2's Tulu 3 resulting in OLMo 2-Instruct. The Open Language Modeling Evaluation System, or OLMES, was created to guide improvements throughout the development stages. It consists of 20 evaluation benchmarks assessing key capabilities. -
9
Mixtral 8x7B
Mistral AI
FreeMixtral 8x7B has open weights and is a high quality sparse mixture expert model (SMoE). Licensed under Apache 2.0. Mixtral outperforms Llama 70B in most benchmarks, with 6x faster Inference. It is the strongest model with an open-weight license and the best overall model in terms of cost/performance tradeoffs. It matches or exceeds GPT-3.5 in most standard benchmarks. -
10
LLaMA
Meta
LLaMA (Large Language Model meta AI) is a state of the art foundational large language model that was created to aid researchers in this subfield. LLaMA allows researchers to use smaller, more efficient models to study these models. This furtherdemocratizes access to this rapidly-changing field. Because it takes far less computing power and resources than large language models, such as LLaMA, to test new approaches, validate other's work, and explore new uses, training smaller foundation models like LLaMA can be a desirable option. Foundation models are trained on large amounts of unlabeled data. This makes them perfect for fine-tuning for many tasks. We make LLaMA available in several sizes (7B-13B, 33B and 65B parameters), and also share a LLaMA card that explains how the model was built in line with our Responsible AI practices. -
11
Vicuna
lmsys.org
FreeVicuna-13B, an open-source chatbot, is trained by fine-tuning LLaMA using user-shared conversations from ShareGPT. Vicuna-13B's preliminary evaluation using GPT-4, as a judge, shows that it achieves a quality of more than 90%* for OpenAI ChatGPT or Google Bard and outperforms other models such as LLaMA or Stanford Alpaca. Vicuna-13B costs around $300 to train. The online demo and the code, along with weights, are available to non-commercial users. -
12
StarCoder
BigCode
FreeStarCoderBase and StarCoder are Large Language Models (Code LLMs), trained on permissively-licensed data from GitHub. This includes data from 80+ programming language, Git commits and issues, Jupyter Notebooks, and Git commits. We trained a 15B-parameter model for 1 trillion tokens, similar to LLaMA. We refined the StarCoderBase for 35B Python tokens. The result is a new model we call StarCoder. StarCoderBase is a model that outperforms other open Code LLMs in popular programming benchmarks. It also matches or exceeds closed models like code-cushman001 from OpenAI, the original Codex model which powered early versions GitHub Copilot. StarCoder models are able to process more input with a context length over 8,000 tokens than any other open LLM. This allows for a variety of interesting applications. By prompting the StarCoder model with a series dialogues, we allowed them to act like a technical assistant. -
13
IBM Granite
IBM
FreeIBM® Granite™ is an AI family that was designed from scratch for business applications. It helps to ensure trust and scalability of AI-driven apps. Granite models are open source and available today. We want to make AI accessible to as many developers as we can. We have made the core Granite Code, Time Series models, Language and GeoSpatial available on Hugging Face, under a permissive Apache 2.0 licence that allows for broad commercial use. Granite models are all trained using carefully curated data. The data used to train them is transparent at a level that is unmatched in the industry. We have also made the tools that we use available to ensure that the data is of high quality and meets the standards required by enterprise-grade applications. -
14
Hermes 3
Nous Research
FreeHermes 3 contains advanced long-term context retention and multi-turn conversation capabilities, complex roleplaying and internal monologue abilities, and enhanced agentic function-calling. Hermes 3 has advanced long-term contextual retention, multi-turn conversation capabilities, complex roleplaying, internal monologue, and enhanced agentic functions-calling. Our training data encourages the model in a very aggressive way to follow the system prompts and instructions exactly and in a highly adaptive manner. Hermes 3 was developed by fine-tuning Llama 3.0 8B, 70B and 405B and training with a dataset primarily containing synthetic responses. The model has a performance that is comparable to Llama 3.1, but with deeper reasoning and creative abilities. Hermes 3 is an instruct and tool-use model series with strong reasoning and creativity abilities. -
15
TinyLlama
TinyLlama
FreeThe TinyLlama Project aims to pretrain an 1.1B Llama on 3 trillion tokens. We can achieve this in "just" 90 day using 16 A100-40G graphics cards with some optimization. We used the exact same architecture and tokenizers as Llama 2 TinyLlama is compatible with many open-source Llama projects. TinyLlama has only 1.1B of parameters. This compactness allows TinyLlama to be used for a variety of applications that require a small computation and memory footprint. -
16
MPT-7B
MosaicML
FreeIntroducing MPT-7B - the latest addition to our MosaicML Foundation Series. MPT-7B, a transformer that is trained from scratch using 1T tokens of code and text, is the latest entry in our MosaicML Foundation Series. It is open-source, available for commercial purposes, and has the same quality as LLaMA-7B. MPT-7B trained on the MosaicML Platform in 9.5 days, with zero human interaction at a cost $200k. You can now train, fine-tune and deploy your private MPT models. You can either start from one of our checkpoints, or you can start from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens! -
17
Alpaca
Stanford Center for Research on Foundation Models (CRFM)
Instruction-following models such as GPT-3.5 (text-DaVinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. These models are now used by many users, and some even for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language. It is vital that the academic community engages in order to make maximum progress towards addressing these pressing issues. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI's text-DaVinci-003. We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta's LLaMA 7B model. -
18
Llama 3.2
Meta
FreeThere are now more versions of the open-source AI model that you can refine, distill and deploy anywhere. Choose from 1B or 3B, or build with Llama 3. Llama 3.2 consists of a collection large language models (LLMs), which are pre-trained and fine-tuned. They come in sizes 1B and 3B, which are multilingual text only. Sizes 11B and 90B accept both text and images as inputs and produce text. Our latest release allows you to create highly efficient and performant applications. Use our 1B and 3B models to develop on-device applications, such as a summary of a conversation from your phone, or calling on-device features like calendar. Use our 11B and 90B models to transform an existing image or get more information from a picture of your surroundings. -
19
Codestral
Mistral AI
FreeWe are proud to introduce Codestral, the first code model we have ever created. Codestral is a generative AI model that is open-weight and specifically designed for code generation. It allows developers to interact and write code using a shared API endpoint for instructions and completion. It can be used for advanced AI applications by software developers as it is able to master both code and English. Codestral has been trained on a large dataset of 80+ languages, including some of the most popular, such as Python and Java. It also includes C, C++ JavaScript, Bash, C, C++. It also performs well with more specific ones, such as Swift and Fortran. Codestral's broad language base allows it to assist developers in a variety of coding environments and projects. -
20
OpenELM
Apple
OpenELM is a family of open-source language models developed by Apple. It uses a layering strategy to allocate parameters efficiently within each layer of a transformer model. This leads to improved accuracy compared to other open language models. OpenELM was trained using publicly available datasets, and it achieves the best performance for its size. -
21
Aya
Cohere AI
Aya is an open-source, state-of-the art, massively multilingual large language research model (LLM), which covers 101 different languages. This is more than twice the number of languages that are covered by open-source models. Aya helps researchers unlock LLMs' powerful potential for dozens of cultures and languages that are largely ignored by the most advanced models available today. We open-source both the Aya Model, as well as the most comprehensive multilingual instruction dataset with 513 million words covering 114 different languages. This data collection contains rare annotations by native and fluent speakers from around the world. This ensures that AI technology is able to effectively serve a global audience who have had limited access up until now. -
22
Llama 3.1
Meta
FreeOpen source AI model that you can fine-tune and distill anywhere. Our latest instruction-tuned models are available in 8B 70B and 405B version. Our open ecosystem allows you to build faster using a variety of product offerings that are differentiated and support your use cases. Choose between real-time or batch inference. Download model weights for further cost-per-token optimization. Adapt to your application, improve using synthetic data, and deploy on-prem. Use Llama components and extend the Llama model using RAG and zero shot tools to build agentic behavior. Use 405B high-quality data to improve specialized model for specific use cases. -
23
Giga ML
Giga ML
We have just launched the X1 large model series. Giga ML’s most powerful model can be used for pre-training, fine-tuning and on-prem deployment. We are Open AI compliant, so your existing integrations, such as long chain, llama index, and others, will work seamlessly. You can continue to pre-train LLM's using domain-specific databooks or docs, or company documents. The world of large-scale language models (LLMs), which offer unprecedented opportunities for natural language process across different domains, is rapidly expanding. Despite this, there are still some critical challenges that remain unresolved. Giga ML proudly introduces the X1 Large model 32k, a pioneering LLM solution on-premise that addresses these critical challenges. -
24
Baichuan-13B
Baichuan Intelligent Technology
FreeBaichuan-13B, a large-scale language model with 13 billion parameters that is open source and available commercially by Baichuan Intelligent, was developed following Baichuan -7B. It has the best results for a language model of the same size in authoritative Chinese and English benchmarks. This release includes two versions of pretraining (Baichuan-13B Base) and alignment (Baichuan-13B Chat). Baichuan-13B has more data and a larger size. It expands the number parameters to 13 billion based on Baichuan -7B, and trains 1.4 trillion coins on high-quality corpus. This is 40% more than LLaMA-13B. It is open source and currently the model with the most training data in 13B size. Support Chinese and English bi-lingual, use ALiBi code, context window is 4096. -
25
RoBERTa
Meta
FreeRoBERTa is based on BERT's language-masking strategy. The system learns to predict hidden sections of text in unannotated language examples. RoBERTa was implemented in PyTorch and modifies key hyperparameters of BERT. This includes removing BERT’s next-sentence-pretraining objective and training with larger mini-batches. This allows RoBERTa improve on the masked-language modeling objective, which is comparable to BERT. It also leads to improved downstream task performance. We are also exploring the possibility of training RoBERTa with a lot more data than BERT and for a longer time. We used both existing unannotated NLP data sets as well as CC-News which was a new set of public news articles. -
26
Granite Code
IBM
FreeWe introduce the Granite family of decoder only code models for code generation tasks (e.g. fixing bugs, explaining codes, documenting codes), trained with code in 116 programming language. The Granite Code family has been evaluated on a variety of tasks and demonstrates that the models are consistently at the top of their game among open source code LLMs. Granite Code models have a number of key advantages. Granite Code models are able to perform at a competitive level or even at the cutting edge of technology in a variety of code-related tasks including code generation, explanations, fixing, translation, editing, and more. Demonstrating the ability to solve a variety of coding tasks. IBM's Corporate Legal team guides all models for trustworthy enterprise use. All models are trained using license-permissible datasets collected according to IBM's AI Ethics Principles. -
27
DBRX
Databricks
Databricks has created an open, general purpose LLM called DBRX. DBRX is the new benchmark for open LLMs. It also provides open communities and enterprises that are building their own LLMs capabilities that were previously only available through closed model APIs. According to our measurements, DBRX surpasses GPT 3.5 and is competitive with Gemini 1.0 Pro. It is a code model that is more capable than specialized models such as CodeLLaMA 70B, and it also has the strength of a general-purpose LLM. This state-of the-art quality is accompanied by marked improvements in both training and inference performances. DBRX is the most efficient open model thanks to its finely-grained architecture of mixtures of experts (MoE). Inference is 2x faster than LLaMA2-70B and DBRX has about 40% less parameters in total and active count compared to Grok-1. -
28
OpenGPT-X
OpenGPT-X
FreeOpenGPT is a German initiative that focuses on developing large AI languages models tailored to European requirements, with an emphasis on versatility, trustworthiness and multilingual capabilities. It also emphasizes open-source accessibility. The project brings together partners to cover the whole generative AI value-chain, from scalable GPU-based infrastructure to data for training large language model to model design, practical applications, and prototypes and proofs-of concept. OpenGPT-X aims at advancing cutting-edge research, with a focus on business applications. This will accelerate the adoption of generative AI within the German economy. The project also stresses responsible AI development to ensure that the models are reliable and aligned with European values and laws. The project provides resources, such as the LLM Workbook and a three part reference guide with examples and resources to help users better understand the key features and characteristics of large AI language model. -
29
CodeGemma
Google
CodeGemma consists of powerful lightweight models that are capable of performing a variety coding tasks, including fill-in the middle code completion, code creation, natural language understanding and mathematical reasoning. CodeGemma offers 3 variants: a 7B model that is pre-trained to perform code completion, code generation, and natural language-to code chat. A 7B model that is instruction-tuned for instruction following and natural language-to code chat. You can complete lines, functions, or even entire blocks of code whether you are working locally or with Google Cloud resources. CodeGemma models are trained on 500 billion tokens primarily of English language data taken from web documents, mathematics and code. They generate code that is not only syntactically accurate but also semantically meaningful. This reduces errors and debugging times. -
30
Stable Beluga
Stability AI
FreeStability AI, in collaboration with its CarperAI Lab, announces Stable Beluga 1 (formerly codenamed FreeWilly) and its successor Stable Beluga 2 - two powerful, new Large Language Models. Both models show exceptional reasoning abilities across a variety of benchmarks. Stable Beluga 1 leverages the original LLaMA 65B foundation model and was carefully fine-tuned with a new synthetically-generated dataset using Supervised Fine-Tune (SFT) in standard Alpaca format. Stable Beluga 2 uses the LLaMA 270B foundation model for industry-leading performance. -
31
OPT
Meta
The ability of large language models to learn in zero- and few shots, despite being trained for hundreds of thousands or even millions of days, has been remarkable. These models are expensive to replicate, due to their high computational cost. The few models that are available via APIs do not allow access to the full weights of the model, making it difficult to study. Open Pre-trained Transformers is a suite decoder-only pre-trained transforms with parameters ranging from 175B to 125M. We aim to share this fully and responsibly with interested researchers. We show that OPT-175B has a carbon footprint of 1/7th that of GPT-3. We will also release our logbook, which details the infrastructure challenges we encountered, as well as code for experimenting on all of the released model. -
32
Stable LM
Stability AI
FreeStableLM: Stability AI language models StableLM builds upon our experience with open-sourcing previous language models in collaboration with EleutherAI. This nonprofit research hub. These models include GPTJ, GPTNeoX and the Pythia Suite, which were all trained on The Pile dataset. Cerebras GPT and Dolly-2 are two recent open-source models that continue to build upon these efforts. StableLM was trained on a new dataset that is three times bigger than The Pile and contains 1.5 trillion tokens. We will provide more details about the dataset at a later date. StableLM's richness allows it to perform well in conversational and coding challenges, despite the small size of its dataset (3-7 billion parameters, compared to GPT-3's 175 billion). The development of Stable LM 3B broadens the range of applications that are viable on the edge or on home PCs. This means that individuals and companies can now develop cutting-edge technologies with strong conversational capabilities – like creative writing assistance – while keeping costs low and performance high. -
33
DeepSeek LLM
DeepSeek
Introducing DeepSeek LLM - an advanced language model with 67 billion parameters. It was trained from scratch using a massive dataset of 2 trillion tokens, both in English and Chinese. To encourage research, we made DeepSeek LLM 67B Base and DeepSeek LLM 67B Chat available as open source to the research community. -
34
Mistral 7B
Mistral AI
We solve the most difficult problems to make AI models efficient, helpful and reliable. We are the pioneers of open models. We give them to our users, and empower them to share their ideas. Mistral-7B is a powerful, small model that can be adapted to many different use-cases. Mistral 7B outperforms Llama 13B in all benchmarks. It has 8k sequence length, natural coding capabilities, and is faster than Llama 2. It is released under Apache 2.0 License and we made it simple to deploy on any cloud. -
35
Cerebras-GPT
Cerebras
FreeThe training of state-of-the art language models is extremely difficult. They require large compute budgets, complex distributed computing techniques and deep ML knowledge. Few organizations are able to train large language models from scratch. The number of organizations that do not open source their results is increasing, even though they have the expertise and resources to do so. We at Cerebras believe in open access to the latest models. Cerebras is proud to announce that Cerebras GPT, a family GPT models with 111 million to thirteen billion parameters, has been released to the open-source community. These models are trained using the Chinchilla Formula and provide the highest accuracy within a given computing budget. Cerebras GPT has faster training times and lower training costs. It also consumes less power than any other publicly available model. -
36
Qwen
Alibaba
FreeQwen LLM is a family of large-language models (LLMs), developed by Damo Academy, an Alibaba Cloud subsidiary. These models are trained using a large dataset of text and codes, allowing them the ability to understand and generate text that is human-like, translate languages, create different types of creative content and answer your question in an informative manner. Here are some of the key features of Qwen LLMs. Variety of sizes: Qwen's series includes sizes ranging from 1.8 billion parameters to 72 billion, offering options that meet different needs and performance levels. Open source: Certain versions of Qwen have open-source code, which is available to anyone for use and modification. Qwen is multilingual and can translate multiple languages including English, Chinese and Japanese. Qwen models are capable of a wide range of tasks, including text summarization and code generation, as well as generation and translation. -
37
Teuken 7B
OpenGPT-X
FreeTeuken-7B, a multilingual open source language model, was developed under the OpenGPT-X project. It is specifically designed to accommodate Europe's diverse linguistic landscape. It was trained on a dataset that included over 50% non-English text, covering all 24 official European Union languages, to ensure robust performance. Teuken-7B's custom multilingual tokenizer is a key innovation. It has been optimized for European languages and enhances training efficiency. The model comes in two versions: Teuken-7B Base, a pre-trained foundational model, and Teuken-7B Instruct, a model that has been tuned to better follow user prompts. Hugging Face makes both versions available, promoting transparency and cooperation within the AI community. The development of Teuken-7B demonstrates a commitment to create AI models that reflect Europe’s diversity. -
38
Mistral NeMo
Mistral AI
FreeMistral NeMo, our new best small model. A state-of the-art 12B with 128k context and released under Apache 2.0 license. Mistral NeMo, a 12B-model built in collaboration with NVIDIA, is available. Mistral NeMo has a large context of up to 128k Tokens. Its reasoning, world-knowledge, and coding precision are among the best in its size category. Mistral NeMo, which relies on a standard architecture, is easy to use. It can be used as a replacement for any system that uses Mistral 7B. We have released Apache 2.0 licensed pre-trained checkpoints and instruction-tuned base checkpoints to encourage adoption by researchers and enterprises. Mistral NeMo has been trained with quantization awareness to enable FP8 inferences without performance loss. The model was designed for global applications that are multilingual. It is trained in function calling, and has a large contextual window. It is better than Mistral 7B at following instructions, reasoning and handling multi-turn conversation. -
39
Dolly
Databricks
FreeDolly is an inexpensive LLM that demonstrates a surprising amount of the capabilities of ChatGPT. Whereas the work from the Alpaca team showed that state-of-the-art models could be coaxed into high quality instruction-following behavior, we find that even years-old open source models with much earlier architectures exhibit striking behaviors when fine tuned on a small corpus of instruction training data. Dolly uses an open source model with 6 billion parameters from EleutherAI, which is modified to include new capabilities like brainstorming and text creation that were not present in the original. -
40
Codestral Mamba
Mistral AI
Codestral Mamba is a Mamba2 model that specializes in code generation. It is available under the Apache 2.0 license. Codestral Mamba represents another step in our efforts to study and provide architectures. We hope that it will open up new perspectives in architecture research. Mamba models have the advantage of linear inference of time and the theoretical ability of modeling sequences of unlimited length. Users can interact with the model in a more extensive way with rapid responses, regardless of the input length. This efficiency is particularly relevant for code productivity use-cases. We trained this model with advanced reasoning and code capabilities, enabling the model to perform at par with SOTA Transformer-based models. -
41
InstructGPT
OpenAI
$0.0200 per 1000 tokensInstructGPT is an open source framework that trains language models to generate natural language instruction from visual input. It uses a generative, pre-trained transformer model (GPT) and the state of the art object detector Mask R-CNN to detect objects in images. Natural language sentences are then generated that describe the image. InstructGPT has been designed to be useful in all domains including robotics, gaming, and education. It can help robots navigate complex tasks using natural language instructions or it can help students learn by giving descriptive explanations of events or processes. -
42
OpenScholar
Ai2
Ai2 OpenScholar, a collaboration between the University of Washington's Allen Institute for AI and the University of Washington, is designed to help scientists navigate and synthesize the vast expanse of the scientific literature. OpenScholar uses a retrieval-augmented model of language to answer user queries. It does this by identifying relevant papers and then generating answers based on those sources. This ensures that information is accurate and linked directly to existing research. OpenScholar-8B set new standards for factuality and accuracy of citations on the ScholarQABench benchmark. OpenScholar-8B, for example, maintains a solid grounding in real retrieved articles in the biomedical domain. This is in contrast to models like GPT-4 which tend to hallucinate references. Twenty scientists from computer science, biomedicine and physics evaluated OpenScholar's answers against expert-written responses to evaluate its real-world application. -
43
Galactica
Meta
Information overload is a major barrier to scientific progress. The explosion of scientific literature and data makes it harder to find useful insights among a vast amount of information. Search engines are used to access scientific knowledge today, but they cannot organize it. Galactica is an extensive language model which can store, combine, and reason about scientific information. We train using a large corpus of scientific papers, reference material and knowledge bases, among other sources. We outperform other models in a variety of scientific tasks. Galactica performs better than the latest GPT-3 on technical knowledge probes like LaTeX Equations by 68.2% to 49.0%. Galactica is also good at reasoning. It outperforms Chinchilla in mathematical MMLU with a score between 41.3% and 35.7%. And PaLM 540B in MATH, with a score between 20.4% and 8.8%. -
44
Llama 3
Meta
FreeMeta AI is our intelligent assistant that allows people to create, connect and get things done. We've integrated Llama 3. Meta AI can be used to code and solve problems, allowing you to see the performance of Llama 3. Llama 3, in 8B or 70B, will give you the flexibility and capabilities you need to create your ideas, whether you're creating AI-powered agents or other applications. We've updated our Responsible Use Guide (RUG), to provide the most comprehensive and up-to-date information on responsible development using LLMs. Our system-centric approach includes updates for our trust and security tools, including Llama Guard 2 optimized to support MLCommons' newly announced taxonomy, code shield and Cybersec Evaluation 2. -
45
Gemma
Google
Gemma is the family of lightweight open models that are built using the same research and technology as the Gemini models. Gemma was developed by Google DeepMind, along with other teams within Google. The name is derived from the Latin gemma meaning "precious stones". We're also releasing new tools to encourage developer innovation, encourage collaboration, and guide responsible use of Gemma model. Gemma models are based on the same infrastructure and technical components as Gemini, Google's largest and most powerful AI model. Gemma 2B, 7B and other open models can achieve the best performance possible for their size. Gemma models can run directly on a desktop or laptop computer for developers. Gemma is able to surpass much larger models in key benchmarks, while adhering our rigorous standards of safe and responsible outputs. -
46
CodeGen
Salesforce
FreeCodeGen is a model for program synthesis that is open-source. Trained on TPU v4. OpenAI Codex is competitive with TPU-v4. -
47
GPT-4 (Generative Pretrained Transformer 4) a large-scale, unsupervised language model that is yet to be released. GPT-4, which is the successor of GPT-3, is part of the GPT -n series of natural-language processing models. It was trained using a dataset of 45TB text to produce text generation and understanding abilities that are human-like. GPT-4 is not dependent on additional training data, unlike other NLP models. It can generate text and answer questions using its own context. GPT-4 has been demonstrated to be capable of performing a wide range of tasks without any task-specific training data, such as translation, summarization and sentiment analysis.
-
48
GPT-J
EleutherAI
FreeGPT-J, a cutting edge language model developed by EleutherAI, is a leading-edge language model. GPT-J's performance is comparable to OpenAI's GPT-3 model on a variety of zero-shot tasks. GPT-J, in particular, has shown that it can surpass GPT-3 at tasks relating to code generation. The latest version of this language model is GPT-J-6B and is built on a linguistic data set called The Pile. This dataset is publically available and contains 825 gibibytes worth of language data organized into 22 subsets. GPT-J has some similarities with ChatGPT. However, GPTJ is not intended to be a chatbot. Its primary function is to predict texts. Databricks made a major development in March 2023 when they introduced Dolly, an Apache-licensed model that follows instructions. -
49
NVIDIA Nemotron
NVIDIA
NVIDIA Nemotron, a family open-source models created by NVIDIA is designed to generate synthetic language data for commercial applications. The Nemotron-4 model 340B is an important release by NVIDIA. It offers developers a powerful tool for generating high-quality data, and filtering it based upon various attributes, using a reward system. -
50
Sarvam AI
Sarvam AI
We are developing large language models that are efficient for India's diverse cultural diversity and enabling GenAI applications with bespoke enterprise models. We are building a platform for enterprise-grade apps that allows you to develop and evaluate them. We believe that open-source can accelerate AI innovation. We will be contributing open-source datasets and models, and leading efforts for large data curation projects in the public-good space. We are a dynamic team of AI experts, combining expertise in research, product design, engineering and business operations. Our diverse backgrounds are united by a commitment to excellence in science, and creating societal impact. We create an environment in which tackling complex tech problems is not only a job but a passion.