Average Ratings 0 Ratings
Average Ratings 0 Ratings
Description
DeepSeekMath is an advanced 7B parameter language model created by DeepSeek-AI, specifically engineered to enhance mathematical reasoning capabilities within open-source language models. Building upon the foundation of DeepSeek-Coder-v1.5, this model undergoes additional pre-training utilizing 120 billion math-related tokens gathered from Common Crawl, complemented by data from natural language and coding sources. It has shown exceptional outcomes, achieving a score of 51.7% on the challenging MATH benchmark without relying on external tools or voting systems, positioning itself as a strong contender against models like Gemini-Ultra and GPT-4. The model's prowess is further bolstered by a carefully curated data selection pipeline and the implementation of Group Relative Policy Optimization (GRPO), which improves both its mathematical reasoning skills and efficiency in memory usage. DeepSeekMath is offered in various formats including base, instruct, and reinforcement learning (RL) versions, catering to both research and commercial interests, and is intended for individuals eager to delve into or leverage sophisticated mathematical problem-solving in the realm of artificial intelligence. Its versatility makes it a valuable resource for researchers and practitioners alike, driving innovation in AI-driven mathematics.
Description
StarCoder and StarCoderBase represent advanced Large Language Models specifically designed for code, developed using openly licensed data from GitHub, which encompasses over 80 programming languages, Git commits, GitHub issues, and Jupyter notebooks. In a manner akin to LLaMA, we constructed a model with approximately 15 billion parameters trained on a staggering 1 trillion tokens. Furthermore, we tailored the StarCoderBase model with 35 billion Python tokens, leading to the creation of what we now refer to as StarCoder.
Our evaluations indicated that StarCoderBase surpasses other existing open Code LLMs when tested against popular programming benchmarks and performs on par with or even exceeds proprietary models like code-cushman-001 from OpenAI, the original Codex model that fueled early iterations of GitHub Copilot. With an impressive context length exceeding 8,000 tokens, the StarCoder models possess the capability to handle more information than any other open LLM, thus paving the way for a variety of innovative applications. This versatility is highlighted by our ability to prompt the StarCoder models through a sequence of dialogues, effectively transforming them into dynamic technical assistants that can provide support in diverse programming tasks.
API Access
Has API
API Access
Has API
Integrations
ChatGPT
CodeQwen
Git
GitHub
LM Studio
OpenAI
Python
Tabby
Taylor AI
Visual Studio Code
Integrations
ChatGPT
CodeQwen
Git
GitHub
LM Studio
OpenAI
Python
Tabby
Taylor AI
Visual Studio Code
Pricing Details
Free
Free Trial
Free Version
Pricing Details
Free
Free Trial
Free Version
Deployment
Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook
Deployment
Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook
Customer Support
Business Hours
Live Rep (24/7)
Online Support
Customer Support
Business Hours
Live Rep (24/7)
Online Support
Types of Training
Training Docs
Webinars
Live Training (Online)
In Person
Types of Training
Training Docs
Webinars
Live Training (Online)
In Person
Vendor Details
Company Name
DeepSeek
Founded
2023
Country
China
Website
deepseek.com
Vendor Details
Company Name
BigCode
Founded
2023
Website
huggingface.co/blog/starcoder