Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.19437

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

speechbrain/sepformer-whamr

Audio-to-Audio • Updated Feb 19, 2024 • 523 • 14
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

deepseek-ai/DeepSeek-V3-Base

685B • Updated Mar 27 • 10.9k • 1.68k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 682k • • 4k
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 143k • • 3.08k

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 81
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 122
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 301
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

Research Papers

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

Learn: LLM Architecture 2025

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 16
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

deepseek-ai/DeepSeek-V3-Base

685B • Updated Mar 27 • 10.9k • 1.68k
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 58
Sleeping

2

Qwen2.5 Bakeneko 32b Instruct Awq

⚡

2

Generate detailed responses to text prompts
Sleeping

3

Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq

⚡

3

Generate text responses to user messages in a chat interface

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73
Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 69
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24 • 15

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 81
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 122
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 301
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

speechbrain/sepformer-whamr

Audio-to-Audio • Updated Feb 19, 2024 • 523 • 14
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

Research Papers

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

Learn: LLM Architecture 2025

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 16
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73

deepseek-ai/DeepSeek-V3-Base

685B • Updated Mar 27 • 10.9k • 1.68k
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 58
Sleeping

2

Qwen2.5 Bakeneko 32b Instruct Awq

⚡

2

Generate detailed responses to text prompts
Sleeping

3

Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq

⚡

3

Generate text responses to user messages in a chat interface

deepseek-ai/DeepSeek-V3-Base

685B • Updated Mar 27 • 10.9k • 1.68k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 682k • • 4k
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 143k • • 3.08k

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 73
Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 69
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24 • 15

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs