Grok 3

xAI Best for: reasoning
Compare this model →
Categories
MATH / AIME 93.3%
reasoning
MMLU 92.7%
general
HumanEval / MBPP 86.5%
coding
GPQA (Diamond) 84.0%
reasoning
LiveCodeBench 79.4%
coding
Chatbot Arena (LMSYS) 1423
general
SimpleQA 43.6%
general
SWE-bench
agenticcoding
TAU-bench
agenticmultiagent
GAIA
agenticmultiagent
WebArena
agentic
MT-Bench
general
AgentBench
multiagent
IFEval
generalagentic

Data unavailable

Data unavailable

Context window
Max output tokens
Input modalities
Output modalities
Supports reasoning
Supports tool use