← All Tools

Best AI Model for Math & Scientific Reasoning

Solving math problems, scientific analysis, physics simulations, and formal reasoning. Needs top-tier GPQA scores.

Our Verdict

Gemini 3.1 Pro destroys the competition here — 94.3% GPQA Diamond is the highest of any model, period. For budget math, o3 at $0.40/$1.60 with 79.2% GPQA and chain-of-thought reasoning is unbeatable value. DeepSeek R1 at $0.55/$2.19 is another budget reasoning option with strong math scores. Don't use budget non-reasoning models for math — they hallucinate numbers and skip steps.

Top Picks

94.3% GPQA Diamond — highest of any model

Best for: Graduate-level science and complex math

Input

$2/1M

Output

$12/1M

Context

1M

Max Output

64K

MMLU-Pro: 91%HumanEval: 95%GPQA: 94.3%
#2o3OpenAI

79.2% GPQA with chain-of-thought at $0.40/$1.60

Best for: Best value for math reasoning

Input

$0.4/1M

Output

$1.6/1M

Context

200K

Max Output

100K

MMLU-Pro: 87%HumanEval: 94.5%GPQA: 79.2%
#3DeepSeek R1DeepSeek

71.5% GPQA at $0.55/$2.19 — cheapest reasoning model

Best for: Budget math with reasoning chains

Input

$0.55/1M

Output

$2.19/1M

Context

128K

Max Output

64K

MMLU-Pro: 84%HumanEval: 92%GPQA: 71.5%

What Matters for Math & Science

Key Factors

  • GPQA score
  • Reasoning chains
  • Mathematical accuracy

Tips

  • Reasoning models (o3, DeepSeek R1) dominate here with chain-of-thought
  • Gemini 3.1 Pro has the highest GPQA Diamond score (94.3%)
  • Don't use budget models for math — accuracy drops significantly

Full Ranking (All Compatible Models)

RankModelInputOutputGPQAScore
#1Gemini 3.1 ProGoogle$2.00$12.0094.3%139
#2o3OpenAI$0.40$1.6079.2%134
#3DeepSeek R1DeepSeek$0.55$2.1971.5%125
#4o4-miniOpenAI$1.10$4.4076%105
#5Gemini 3 ProGoogle$2.00$12.0077%103
#6GLM-5Zhipu AI$1.00$3.2072%103
#7GPT-5.3 CodexOpenAI$2.00$16.0078%103
#8GLM-4.7Zhipu AI$0.60$2.2085.7%103
#9GPT-5.2 CodexOpenAI$1.75$14.0076%102
#10Gemini 2.5 ProGoogle$1.25$10.0076%93
#11Claude Opus 4.6Anthropic$5.00$25.0075.5%92
#12MiniMax M2.5MiniMax$0.30$1.2086.0%87
#13GPT-5OpenAI$1.25$10.0073.5%87
#14Gemini 2.5 FlashGoogle$0.15$0.6082.8%86
#15Gemini 3 FlashGoogle$0.50$3.0084.0%85
#16Grok 4xAI$3.00$15.0072%84
#17Mistral Large 3Mistral$2.00$5.0087.0%80
#18Claude Sonnet 4.6Anthropic$3.00$15.0070%77
#19Claude Sonnet 4.5Anthropic$3.00$15.0068.2%75
#20DeepSeek V3DeepSeek$0.14$0.2883.5%74

Compare Top Picks

Other Use Cases