DeepSeek R1

DeepSeek Best for: reasoning
Compare this model →
Categories
MATH / AIME 97.3%
reasoning
HumanEval / MBPP 92.0%
coding
MMLU 90.8%
general
GPQA (Diamond) 71.5%
reasoning
Chatbot Arena (LMSYS) 1358
general
LiveCodeBench 65.9%
coding
SWE-bench 49.2%
agenticcoding
TAU-bench
agenticmultiagent
GAIA
agenticmultiagent
WebArena
agentic
MT-Bench
general
AgentBench
multiagent
IFEval
generalagentic
SimpleQA
general

Data unavailable

Data unavailable

Context window
Max output tokens
Input modalities
Output modalities
Supports reasoning
Supports tool use