Best Mistral Large 2 Alternatives in 2024
Find the top alternatives to Mistral Large 2 currently available. Compare ratings, reviews, pricing, and features of Mistral Large 2 alternatives in 2024. Slashdot lists the best Mistral Large 2 alternatives on the market that offer competing products that are similar to Mistral Large 2. Sort through Mistral Large 2 alternatives below to make the best choice for your needs
-
1
Jamba
AI21 Labs
Jamba is a powerful and efficient long context model that is open to builders, but built for enterprises. Jamba's latency is superior to all other leading models of similar size. Jamba's 256k window is the longest available. Jamba's Mamba Transformer MoE Architecture is designed to increase efficiency and reduce costs. Jamba includes key features from OOTB, including function calls, JSON output, document objects and citation mode. Jamba 1.5 models deliver high performance throughout the entire context window. Jamba 1.5 models score highly in common quality benchmarks. Secure deployment tailored to your enterprise. Start using Jamba immediately on our production-grade SaaS Platform. Our strategic partners can deploy the Jamba model family. For enterprises who require custom solutions, we offer VPC and on-premise deployments. We offer hands-on management and continuous pre-training for enterprises with unique, bespoke needs. -
2
BLACKBOX AI
BLACKBOX AI
Free 1 RatingAvailable in more than 20 programming languages, including Python, JavaScript and TypeScript, Ruby, TypeScript, Go, Ruby and many others. BLACKBOX AI code search was created so that developers could find the best code fragments to use when building amazing products. Integrations with IDEs include VS Code and Github Codespaces. Jupyter Notebook, Paperspace, and many more. C#, Java, C++, C# and SQL, PHP, Go and TypeScript are just a few of the languages that can be used to search code in Python, Java and C++. It is not necessary to leave your coding environment in order to search for a specific function. Blackbox allows you to select the code from any video and then simply copy it into your text editor. Blackbox supports all programming languages and preserves the correct indentation. The Pro plan allows you to copy text from over 200 languages and all programming languages. -
3
Qwen2.5
QwenLM
FreeQwen2.5, an advanced multimodal AI system, is designed to provide highly accurate responses that are context-aware across a variety of applications. It builds on its predecessors' capabilities, integrating cutting edge natural language understanding, enhanced reasoning, creativity and multimodal processing. Qwen2.5 is able to analyze and generate text as well as interpret images and interact with complex data in real-time. It is highly adaptable and excels at personalized assistance, data analytics, creative content creation, and academic research. This makes it a versatile tool that can be used by professionals and everyday users. Its user-centric approach emphasizes transparency, efficiency and alignment with ethical AI. -
4
Marco-o1
AIDC-AI
FreeMarco-o1 is an advanced AI model that is designed for high-performance problem solving and natural language processing. It is designed to deliver precise, contextually rich answers by combining deep language understanding with a streamlined architectural design for speed and efficiency. Marco-o1 is a versatile AI system that excels at a wide range of tasks, including conversational AI. It also excels at content creation, technical assistance, and decision-making. It adapts seamlessly to the needs of diverse users. Marco-o1 is a cutting edge solution for individuals and organisations seeking intelligent, adaptive and scalable AI tools. It focuses on intuitive interactions, reliability and ethical AI principles. MCTS allows for the exploration of multiple reasoning pathways using confidence scores derived by softmax-applied logging probabilities of the top k alternative tokens. This guides the model to optimal solution. -
5
Samsung Gauss
Samsung
Samsung Gauss, a new AI-model developed by Samsung Electronics, is a powerful AI tool. It is a large-language model (LLM) which has been trained using a massive dataset. Samsung Gauss can generate text, translate different languages, create creative content and answer questions in a helpful way. Samsung Gauss, which is still in development, has already mastered many tasks, including Follow instructions and complete requests with care. Answering questions in an informative and comprehensive way, even when they are open-ended, challenging or strange. Creating different creative text formats such as poems, code, musical pieces, emails, letters, etc. Here are some examples to show what Samsung Gauss is capable of: Translation: Samsung Gauss is able to translate text between many languages, including English and German, as well as Spanish, Chinese, Japanese and Korean. Coding: Samsung Gauss can generate code. -
6
mT5
Google
FreeMultilingual T5 is a massively pretrained text-totext transformer model that has been trained using a similar recipe to T5. This repo can used to reproduce the experiments described in the mT5 article. The mC4 corpus covers 101 languages. Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, and more. -
7
Codestral
Mistral AI
FreeWe are proud to introduce Codestral, the first code model we have ever created. Codestral is a generative AI model that is open-weight and specifically designed for code generation. It allows developers to interact and write code using a shared API endpoint for instructions and completion. It can be used for advanced AI applications by software developers as it is able to master both code and English. Codestral has been trained on a large dataset of 80+ languages, including some of the most popular, such as Python and Java. It also includes C, C++ JavaScript, Bash, C, C++. It also performs well with more specific ones, such as Swift and Fortran. Codestral's broad language base allows it to assist developers in a variety of coding environments and projects. -
8
Mistral NeMo
Mistral AI
FreeMistral NeMo, our new best small model. A state-of the-art 12B with 128k context and released under Apache 2.0 license. Mistral NeMo, a 12B-model built in collaboration with NVIDIA, is available. Mistral NeMo has a large context of up to 128k Tokens. Its reasoning, world-knowledge, and coding precision are among the best in its size category. Mistral NeMo, which relies on a standard architecture, is easy to use. It can be used as a replacement for any system that uses Mistral 7B. We have released Apache 2.0 licensed pre-trained checkpoints and instruction-tuned base checkpoints to encourage adoption by researchers and enterprises. Mistral NeMo has been trained with quantization awareness to enable FP8 inferences without performance loss. The model was designed for global applications that are multilingual. It is trained in function calling, and has a large contextual window. It is better than Mistral 7B at following instructions, reasoning and handling multi-turn conversation. -
9
Codestral Mamba
Mistral AI
Codestral Mamba is a Mamba2 model that specializes in code generation. It is available under the Apache 2.0 license. Codestral Mamba represents another step in our efforts to study and provide architectures. We hope that it will open up new perspectives in architecture research. Mamba models have the advantage of linear inference of time and the theoretical ability of modeling sequences of unlimited length. Users can interact with the model in a more extensive way with rapid responses, regardless of the input length. This efficiency is particularly relevant for code productivity use-cases. We trained this model with advanced reasoning and code capabilities, enabling the model to perform at par with SOTA Transformer-based models. -
10
Claude is an artificial intelligence language model that can generate text with human-like processing. Anthropic is an AI safety company and research firm that focuses on building reliable, interpretable and steerable AI systems. While large, general systems can provide significant benefits, they can also be unpredictable, unreliable and opaque. Our goal is to make progress in these areas. We are currently focusing on research to achieve these goals. However, we see many opportunities for our work in the future to create value both commercially and for the public good.
-
11
Claude Pro
Anthropic
$18/month Claude Pro is a large language model that can handle complex tasks with a friendly and accessible demeanor. It is trained on high-quality, extensive data and excels at understanding contexts, interpreting subtleties, and producing well structured, coherent responses to a variety of topics. Claude Pro is able to create detailed reports, write creative content, summarize long documents, and assist with coding tasks by leveraging its robust reasoning capabilities and refined knowledge base. Its adaptive algorithms constantly improve its ability learn from feedback. This ensures that its output is accurate, reliable and helpful. Whether Claude Pro is serving professionals looking for expert support or individuals seeking quick, informative answers - it delivers a versatile, productive conversational experience. -
12
Code Llama
Meta
FreeCode Llama, a large-language model (LLM), can generate code using text prompts. Code Llama, the most advanced publicly available LLM for code tasks, has the potential to improve workflows for developers and reduce the barrier for those learning to code. Code Llama can be used to improve productivity and educate programmers to create more robust, well documented software. Code Llama, a state-of the-art LLM, is capable of generating both code, and natural languages about code, based on both code and natural-language prompts. Code Llama can be used for free in research and commercial purposes. Code Llama is a new model that is built on Llama 2. It is available in 3 models: Code Llama is the foundational model of code; Codel Llama is a Python-specific language. Code Llama-Instruct is a finely tuned natural language instruction interpreter. -
13
StarCoder
BigCode
FreeStarCoderBase and StarCoder are Large Language Models (Code LLMs), trained on permissively-licensed data from GitHub. This includes data from 80+ programming language, Git commits and issues, Jupyter Notebooks, and Git commits. We trained a 15B-parameter model for 1 trillion tokens, similar to LLaMA. We refined the StarCoderBase for 35B Python tokens. The result is a new model we call StarCoder. StarCoderBase is a model that outperforms other open Code LLMs in popular programming benchmarks. It also matches or exceeds closed models like code-cushman001 from OpenAI, the original Codex model which powered early versions GitHub Copilot. StarCoder models are able to process more input with a context length over 8,000 tokens than any other open LLM. This allows for a variety of interesting applications. By prompting the StarCoder model with a series dialogues, we allowed them to act like a technical assistant. -
14
Gemini 2.0
Google
FreeGemini 2.0, an advanced AI model developed by Google is designed to offer groundbreaking capabilities for natural language understanding, reasoning and multimodal interaction. Gemini 2.0 builds on the success of Gemini's predecessor by integrating large language processing and enhanced problem-solving, decision-making, and interpretation abilities. This allows it to interpret and produce human-like responses more accurately and nuanced. Gemini 2.0, unlike traditional AI models, is trained to handle a variety of data types at once, including text, code, images, etc. This makes it a versatile tool that can be used in research, education, business and creative industries. Its core improvements are better contextual understanding, reduced biased, and a more effective architecture that ensures quicker, more reliable results. Gemini 2.0 is positioned to be a major step in the evolution AI, pushing the limits of human-computer interactions. -
15
Mixtral 8x22B
Mistral AI
FreeMixtral 8x22B is our latest open model. It sets new standards for performance and efficiency in the AI community. It is a sparse Mixture-of-Experts model (SMoE), which uses only 39B active variables out of 141B. This offers unparalleled cost efficiency in relation to its size. It is fluently bilingual in English, French Italian, German and Spanish. It has strong math and coding skills. It is natively able to call functions; this, along with the constrained-output mode implemented on La Plateforme, enables application development at scale and modernization of tech stacks. Its 64K context window allows for precise information retrieval from large documents. We build models with unmatched cost-efficiency for their respective sizes. This allows us to deliver the best performance-tocost ratio among models provided by the Community. Mixtral 8x22B continues our open model family. Its sparse patterns of activation make it faster than any 70B model. -
16
TextGears
TextGears
$4.90TextGears provides translation, paraphrasing and text checking services for hundreds companies around the globe. Free demo available online. API allows to integrate TextGears text analysis into any modern software product. On-premise installation will be the best options for those companies that cannot use any services our of the corporate network. Supported languages include: English, French, German, Portuguese, Russian, Italian, Arabic, Spanish, Japanese, Chinese and Greek. -
17
ChatGPT is an OpenAI language model. It can generate human-like responses to a variety prompts, and has been trained on a wide range of internet texts. ChatGPT can be used to perform natural language processing tasks such as conversation, question answering, and text generation. ChatGPT is a pretrained language model that uses deep-learning algorithms to generate text. It was trained using large amounts of text data. This allows it to respond to a wide variety of prompts with human-like ease. It has a transformer architecture that has been proven to be efficient in many NLP tasks. ChatGPT can generate text in addition to answering questions, text classification and language translation. This allows developers to create powerful NLP applications that can do specific tasks more accurately. ChatGPT can also process code and generate it.
-
18
ChatGPT Enterprise
OpenAI
ChatGPT Enterprise is the most powerful version yet, with enterprise-grade security and privacy. 1. Training models do not use customer prompts or data 2. Data encryption in transit and at rest (TLS 1.2+). 3. SOC 2 compliant 4. Easy bulk member management and dedicated admin console 5. SSO and Domain Verification 6. Use the analytics dashboard to understand usage 7. Access to GPT-4 Advanced Data Analysis and GPT-4 at high speed is unlimited 8. 32k token context window for 4X longer inputs, memory and inputs 9. Shareable chat templates to help your company collaborate -
19
zeemo
zeemo
$7.99 per hourUpload subtitle and video files to match text to video content. Upload raw transcript files and video without any timeline information. Transcripts will automatically be added with timestamps. You can edit it online and then download subtitle files or video with sub-titles directly. Original video language supports English and Spanish, Simplified Chinese and Traditional Chinese, Cantonese as well as Japanese, Korean, French (with subtitles), German, Italian, Vietnamese, Arabic, and Portuguese. A single line word limit is the maximum number words that can be included in a line subtitles. The system will make reasonable cuts based on the single-line word limit if a paragraph contains many words. This will ensure that the maximum number of words in a subtitle line does not exceed the limit. This improves the subtitle display and facilitates reading. -
20
Granite Code
IBM
FreeWe introduce the Granite family of decoder only code models for code generation tasks (e.g. fixing bugs, explaining codes, documenting codes), trained with code in 116 programming language. The Granite Code family has been evaluated on a variety of tasks and demonstrates that the models are consistently at the top of their game among open source code LLMs. Granite Code models have a number of key advantages. Granite Code models are able to perform at a competitive level or even at the cutting edge of technology in a variety of code-related tasks including code generation, explanations, fixing, translation, editing, and more. Demonstrating the ability to solve a variety of coding tasks. IBM's Corporate Legal team guides all models for trustworthy enterprise use. All models are trained using license-permissible datasets collected according to IBM's AI Ethics Principles. -
21
Command R+
Cohere
FreeCommand R+, Cohere's latest large language model, is optimized for conversational interactions and tasks with a long context. It is designed to be extremely performant and enable companies to move from proof-of-concept into production. We recommend Command R+ when working with workflows that rely on complex RAG functionality or multi-step tool usage (agents). Command R is better suited for retrieval augmented creation (RAG) tasks and single-step tool usage, or applications where cost is a key consideration. -
22
LongLLaMA
LongLLaMA
FreeThis repository contains a research preview of LongLLaMA. It is a large language-model capable of handling contexts up to 256k tokens. LongLLaMA was built on the foundation of OpenLLaMA, and fine-tuned with the Focused Transformer method. LongLLaMA code was built on the foundation of Code Llama. We release a smaller base variant of the LongLLaMA (not instruction-tuned) on a permissive licence (Apache 2.0), and inference code that supports longer contexts for hugging face. Our model weights are a drop-in replacement for LLaMA (for short contexts up to 2048 tokens) in existing implementations. We also provide evaluation results, and comparisons with the original OpenLLaMA model. -
23
Pixtral 12B
Mistral AI
FreePixtral 12B, a multimodal AI model pioneered by Mistral AI and designed to process and understand both text and images data seamlessly, is a groundbreaking AI model. This model represents a significant advance in the integration of data types. It allows for more intuitive interaction and enhanced content creation abilities. Pixtral 12B, which is based on Mistral's NeMo 12B Text Model, incorporates an additional Vision Adapter that adds 400 million parameters. This allows it to handle visual inputs of up to 1024x1024 pixels. This model is capable of a wide range of applications from image analysis to answering visual content questions. Its versatility is demonstrated in real-world scenarios. Pixtral 12B is a powerful tool for developers, as it not only has a large context of 128k tokens, but also uses innovative techniques such as GeLU activation and RoPE 2D for its vision components. -
24
CADopia
CADopia
CADopia is a powerful Computer-Aided-Design software for engineers, architects, designers and drafters -- virtually anyone who creates, edits, or views professional drawings. CADopia 19 can be downloaded in 12 languages: Chinese, Czech, English and German. CADopia Professional Services can help maximize the return on your investment in CAD technology. CADopia offers upfront consulting, custom application development, training for staff, technical support, and project outsourcing. Productivity-enhancing drafting tools like custom construction plane, entity snaps grids, entity, and polar tracking allow you to complete your drawings accurately and efficiently. -
25
OpenGPT-X
OpenGPT-X
FreeOpenGPT is a German initiative that focuses on developing large AI languages models tailored to European requirements, with an emphasis on versatility, trustworthiness and multilingual capabilities. It also emphasizes open-source accessibility. The project brings together partners to cover the whole generative AI value-chain, from scalable GPU-based infrastructure to data for training large language model to model design, practical applications, and prototypes and proofs-of concept. OpenGPT-X aims at advancing cutting-edge research, with a focus on business applications. This will accelerate the adoption of generative AI within the German economy. The project also stresses responsible AI development to ensure that the models are reliable and aligned with European values and laws. The project provides resources, such as the LLM Workbook and a three part reference guide with examples and resources to help users better understand the key features and characteristics of large AI language model. -
26
Claude 3 Opus
Anthropic
FreeOpus, our intelligent model, is superior to its peers in most of the common benchmarks for AI systems. These include undergraduate level expert knowledge, graduate level expert reasoning, basic mathematics, and more. It displays near-human levels in terms of comprehension and fluency when tackling complex tasks. This is at the forefront of general intelligence. All Claude 3 models have increased capabilities for analysis and forecasting. They also offer nuanced content generation, code generation and the ability to converse in non-English language such as Spanish, Japanese and French. -
27
Chinchilla
Google DeepMind
Chinchilla has a large language. Chinchilla has the same compute budget of Gopher, but 70B more parameters and 4x as much data. Chinchilla consistently and significantly outperforms Gopher 280B, GPT-3 175B, Jurassic-1 178B, and Megatron-Turing (530B) in a wide range of downstream evaluation tasks. Chinchilla also uses less compute to perform fine-tuning, inference and other tasks. This makes it easier for downstream users to use. Chinchilla reaches a high-level average accuracy of 67.5% for the MMLU benchmark. This is a greater than 7% improvement compared to Gopher. -
28
LTM-1
Magic AI
Magic's LTM-1 provides context windows 50x larger than transformers. Magic has trained a Large Language Model that can take in huge amounts of context to generate suggestions. Magic, our coding assistant can now see all of your code. AI models can refer to more factual and explicit information with larger context windows. They can also reference their own actions history. This research will hopefully improve reliability and coherence. -
29
Baichuan-13B
Baichuan Intelligent Technology
FreeBaichuan-13B, a large-scale language model with 13 billion parameters that is open source and available commercially by Baichuan Intelligent, was developed following Baichuan -7B. It has the best results for a language model of the same size in authoritative Chinese and English benchmarks. This release includes two versions of pretraining (Baichuan-13B Base) and alignment (Baichuan-13B Chat). Baichuan-13B has more data and a larger size. It expands the number parameters to 13 billion based on Baichuan -7B, and trains 1.4 trillion coins on high-quality corpus. This is 40% more than LLaMA-13B. It is open source and currently the model with the most training data in 13B size. Support Chinese and English bi-lingual, use ALiBi code, context window is 4096. -
30
Qwen
Alibaba
FreeQwen LLM is a family of large-language models (LLMs), developed by Damo Academy, an Alibaba Cloud subsidiary. These models are trained using a large dataset of text and codes, allowing them the ability to understand and generate text that is human-like, translate languages, create different types of creative content and answer your question in an informative manner. Here are some of the key features of Qwen LLMs. Variety of sizes: Qwen's series includes sizes ranging from 1.8 billion parameters to 72 billion, offering options that meet different needs and performance levels. Open source: Certain versions of Qwen have open-source code, which is available to anyone for use and modification. Qwen is multilingual and can translate multiple languages including English, Chinese and Japanese. Qwen models are capable of a wide range of tasks, including text summarization and code generation, as well as generation and translation. -
31
StableCode
Stability AI
StableCode is a unique tool that helps developers become more efficient in their coding by using three models. The base model was trained first on a diverse collection of programming languages using the stack-dataset from BigCode, and then further trained with popular languages such as Python, Go, Java, Javascript, C, markdown and C++. We trained our models using 560B tokens on our HPC Cluster. After the base model was established, the instruction models were tuned for specific usecases to help solve complex programming problems. To achieve this result, 120,000 code instructions/response pair in Alpaca format was trained on the base. StableCode is a great tool for anyone who wants to learn more about programming. The long-context model is a great assistant for ensuring that autocomplete suggestions for single- and multiple-line input are available to the user. This model was designed to handle a large amount of code at once. -
32
Megatron-Turing
NVIDIA
Megatron-Turing Natural Language Generation Model (MT-NLG) is the largest and most powerful monolithic English language model. It has 530 billion parameters. This 105-layer transformer-based MTNLG improves on the previous state-of-the art models in zero, one, and few shot settings. It is unmatched in its accuracy across a wide range of natural language tasks, including Completion prediction and Reading comprehension. NVIDIA has announced an Early Access Program for its managed API service in MT-NLG Mode. This program will allow customers to experiment with, employ and apply a large language models on downstream language tasks. -
33
ExaminLab
Arca24
€109 for 10 creditsExaminLab is an online platform by Arca24 that allows to assess and to measure the linguistic and professional skills of your candidates and employees. Tests based on CAT technology (Computerised Adaptive Tests), consist of asking questions of increasing difficulty depending on the answers received from the person from time to time: if the candidate answers the first test correctly, they will be asked questions of greater difficulty; if not, they will be asked easier questions. Test can be done remotely from any device and available in 6 languages. A growing test library provides you with the best tools to screen your candidates and employees so you can make better, faster, and easier hiring decisions: - International languages (English, French, Italian, German, Spanish, Portuguese) - Non-European languages (Chinese, Japanese, Arabic and Russian) - Business languages - Microsoft Office and Accounting (verification of skills in the use of Word, Excel, PowerPoint, messaging tools, etc….). -
34
TntConnect
TntWare
TntConnect is a program that helps you manage your relationships with ministry partners. It is intended for missionaries who are responsible for raising their own support, but it can be used by anyone. Sharing TntConnect with another missionary is a way to make sure you have more time for the things God has called you. TntConnect is yours free of charge! You can download it and use it for free. It is free to download and share with your friends. I hope that you find this software useful for your ministry. TntConnect is available for download in Arabic, Dutch English, French German, Japanese, Korean Portuguese, Russian, Simplified Chinese and Spanish. -
35
hihaho
hihaho
97 euro per videoResearch shows that 70% of B2B buyers prefer video during the customer journey. Interactive video can be used to reduce the learning time by 40-60% when compared to traditional methods. Upload your video or use a video from Youtube or Vimeo. You can add questions, chapters, buttons and links to our video maker. You can share it however you like: a customized URL, an embed code or via SCORM and xAPI. You can watch it on any device. Get a detailed look at how viewers used the video. You can then adjust your approach based on this information. -
36
Gemma 2
Google
Gemini models are a family of light-open, state-of-the art models that was created using the same research and technology as Gemini models. These models include comprehensive security measures, and help to ensure responsible and reliable AI through selected data sets. Gemma models have exceptional comparative results, even surpassing some larger open models, in their 2B and 7B sizes. Keras 3.0 offers seamless compatibility with JAX TensorFlow PyTorch and JAX. Gemma 2 has been redesigned to deliver unmatched performance and efficiency. It is optimized for inference on a variety of hardware. The Gemma models are available in a variety of models that can be customized to meet your specific needs. The Gemma models consist of large text-to text lightweight language models that have a decoder and are trained on a large set of text, code, or mathematical content. -
37
DocTranslator
Translation Cloud
$0.004 per word 6 RatingsAny MS Word.DOCX file, any Excel spreadsheet or PowerPoint presentation can be translated. Translate any Word Document or Excel File, Adobe PDF, PowerPoint presentation, or InDesign file into more than 100 languages: English. Spanish. French. German. Dutch. Danish. Japanese. Korean. Russian. Portuguese. Doc Translator uses neural machine translation technology that provides human-like quality (80-90% accuracy), preserves original layout, and provides same-day turn-around for large documents. -
38
CodeT5
Salesforce
Code for CodeT5, A new pre-trained encoder/decoder model that is code-aware. Identifier-aware pre-trained encoder/decoder models. This is the official PyTorch version of the EMNLP paper 2021 from Salesforce Research. CodeT5-large ntp-py was optimized for Python code generation and used as the foundation model in our CodeRL. This yielded new SOTA results for the APPS Python Competition-Level Program Synthesis benchmark. This repository contains the code to reproduce the experiments in CodeT5. CodeT5 is an encoder-decoder pre-trained model for programming language, which has been pre-trained with 8.35M functions from 8 programming languages: Python, Java, JavaScript PHP, Ruby Go, C#, C, and C#. It achieves the best results in a benchmark for code intelligence - CodeXGLUE. Generate code from the natural language description. -
39
Mathstral
Mistral AI
As a tribute for Archimedes' 2311th birthday, which we celebrate this year, we release our first Mathstral 7B model, designed specifically for math reasoning and scientific discoveries. The model comes with a 32k context-based window that is published under the Apache 2.0 License. Mathstral is a tool we're donating to the science community in order to help solve complex mathematical problems that require multi-step logical reasoning. The Mathstral release was part of a larger effort to support academic project, and it was produced as part of our collaboration with Project Numina. Mathstral, like Isaac Newton at his time, stands on Mistral 7B's shoulders and specializes in STEM. It has the highest level of reasoning in its size category, based on industry-standard benchmarks. It achieves 56.6% in MATH and 63.47% in MMLU. The following table shows the MMLU performance differences between Mathstral and Mistral 7B. -
40
XLNet
XLNet
FreeXLNet, a new unsupervised language representation method, is based on a novel generalized Permutation Language Modeling Objective. XLNet uses Transformer-XL as its backbone model. This model is excellent for language tasks that require long context. Overall, XLNet achieves state of the art (SOTA) results in various downstream language tasks, including question answering, natural languages inference, sentiment analysis and document ranking. -
41
GPT-NeoX
EleutherAI
FreeA model parallel autoregressive transformator implementation on GPUs based on the DeepSpeed Library. This repository contains EleutherAI’s library for training large language models on GPUs. Our current framework is based upon NVIDIA's Megatron Language Model, and has been enhanced with techniques from DeepSpeed, as well as some novel improvements. This repo is intended to be a central and accessible place for techniques to train large-scale autoregressive models and to accelerate research into large scale training. -
42
NVIDIA NeMo Megatron
NVIDIA
NVIDIA NeMo megatron is an end to-end framework that can be used to train and deploy LLMs with billions or trillions of parameters. NVIDIA NeMo Megatron is part of the NVIDIAAI platform and offers an efficient, cost-effective, and cost-effective containerized approach to building and deploying LLMs. It is designed for enterprise application development and builds upon the most advanced technologies of NVIDIA research. It provides an end-to–end workflow for automated distributed processing, training large-scale customized GPT-3 and T5 models, and deploying models to infer at scale. The validation of converged recipes that allow for training and inference is a key to unlocking the power and potential of LLMs. The hyperparameter tool makes it easy to customize models. It automatically searches for optimal hyperparameter configurations, performance, and training/inference for any given distributed GPU cluster configuration. -
43
CodeQwen
QwenLM
FreeCodeQwen, developed by the Qwen Team, Alibaba Cloud, is the code version. It is a transformer based decoder only language model that has been pre-trained with a large number of codes. A series of benchmarks shows that the code generation is strong and that it performs well. Supporting long context generation and understanding with a context length of 64K tokens. CodeQwen is a 92-language coding language that provides excellent performance for text-to SQL, bug fixes, and more. CodeQwen chat is as simple as writing a few lines of code using transformers. We build the tokenizer and model using pre-trained methods and use the generate method for chatting. The chat template is provided by the tokenizer. Following our previous practice, we apply the ChatML Template for chat models. The model will complete the code snippets in accordance with the prompts without any additional formatting. -
44
DeepSeek LLM
DeepSeek
Introducing DeepSeek LLM - an advanced language model with 67 billion parameters. It was trained from scratch using a massive dataset of 2 trillion tokens, both in English and Chinese. To encourage research, we made DeepSeek LLM 67B Base and DeepSeek LLM 67B Chat available as open source to the research community. -
45
Aya
Cohere AI
Aya is an open-source, state-of-the art, massively multilingual large language research model (LLM), which covers 101 different languages. This is more than twice the number of languages that are covered by open-source models. Aya helps researchers unlock LLMs' powerful potential for dozens of cultures and languages that are largely ignored by the most advanced models available today. We open-source both the Aya Model, as well as the most comprehensive multilingual instruction dataset with 513 million words covering 114 different languages. This data collection contains rare annotations by native and fluent speakers from around the world. This ensures that AI technology is able to effectively serve a global audience who have had limited access up until now. -
46
PanGu-Σ
Huawei
The expansion of large language model has led to significant advancements in natural language processing, understanding and generation. This study introduces a new system that uses Ascend 910 AI processing units and the MindSpore framework in order to train a language with over one trillion parameters, 1.085T specifically, called PanGu-Sigma. This model, which builds on the foundation laid down by PanGu-alpha transforms the traditional dense Transformer model into a sparse model using a concept called Random Routed Experts. The model was trained efficiently on a dataset consisting of 329 billion tokens, using a technique known as Expert Computation and Storage Separation. This led to a 6.3 fold increase in training performance via heterogeneous computer. The experiments show that PanGu-Sigma is a new standard for zero-shot learning in various downstream Chinese NLP tasks. -
47
Alpa
Alpa
FreeAlpa aims automate large-scale distributed training. Alpa was originally developed by people at UC Berkeley's Sky Lab. Alpa's advanced techniques were described in a paper published by OSDI'2022. Google is adding new members to the Alpa community. A language model is a probabilistic distribution of probability over a sequence of words. It uses all the words it has seen to predict the next word. It is useful in a variety AI applications, including the auto-completion of your email or chatbot service. You can find more information on the language model Wikipedia page. GPT-3 is a large language model with 175 billion parameters that uses deep learning to produce text that looks human-like. GPT-3 was described by many researchers and news articles as "one the most important and interesting AI systems ever created." GPT-3 is being used as a backbone for the latest NLP research. -
48
LLaMA
Meta
LLaMA (Large Language Model meta AI) is a state of the art foundational large language model that was created to aid researchers in this subfield. LLaMA allows researchers to use smaller, more efficient models to study these models. This furtherdemocratizes access to this rapidly-changing field. Because it takes far less computing power and resources than large language models, such as LLaMA, to test new approaches, validate other's work, and explore new uses, training smaller foundation models like LLaMA can be a desirable option. Foundation models are trained on large amounts of unlabeled data. This makes them perfect for fine-tuning for many tasks. We make LLaMA available in several sizes (7B-13B, 33B and 65B parameters), and also share a LLaMA card that explains how the model was built in line with our Responsible AI practices. -
49
NLP Cloud
NLP Cloud
$29 per monthProduction-ready AI models that are fast and accurate. High-availability inference API that leverages the most advanced NVIDIA GPUs. We have selected the most popular open-source natural language processing models (NLP) and deployed them for the community. You can fine-tune your models (including GPT-J) or upload your custom models. Then, deploy them to production. Upload your AI models, including GPT-J, to your dashboard and immediately use them in production. -
50
LLaVA
LLaVA
FreeLLaVA is a multimodal model that combines a Vicuna language model with a vision encoder to facilitate comprehensive visual-language understanding. LLaVA's chat capabilities are impressive, emulating multimodal functionality of models such as GPT-4. LLaVA 1.5 has achieved the best performance in 11 benchmarks using publicly available data. It completed training on a single 8A100 node in about one day, beating methods that rely upon billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been crucial in training LLaVA for a wide range of visual and linguistic tasks.