Grok 3 Mini
radar chart — all benchmarks
family — grok
benchmark scores Scores from Mar 2026
| Categories | ||
|---|---|---|
| MATH / AIME | 95.8% | reasoning |
| LiveCodeBench | 80.4% | coding |
| Chatbot Arena (LMSYS) | 1366 | general |
| SimpleQA | 21.7% | general |
| SWE-bench | — | agenticcoding |
| HumanEval / MBPP | — | coding |
| MMLU | — | general |
| GPQA (Diamond) | — | reasoning |
| TAU-bench | — | agenticmultiagent |
| GAIA | — | agenticmultiagent |
| WebArena | — | agentic |
| MT-Bench | — | general |
| AgentBench | — | multiagent |
| IFEval | — | generalagentic |
pricing — per 1M tokens via openrouter
Data unavailable
latency percentiles — time to first token (ms)
Data unavailable
model specifications
Context window —
Max output tokens —
Input modalities
—
Output modalities
—
Supports reasoning —
Supports tool use —