AI Hype Tracker
A site that keeps score. Longitudinal claim tracking, a transparent weekly index, and a daily digest from a deliberately balanced source diet.
HYPE INDEX WEEK 2026-20
methodology →–
A single number 0–100 capturing how hyped the AI conversation is this week. Built from four interrogable components below, each scored 0–25. Higher means more capability-claim volume, funding pressure, and mainstream media saturation — and less skeptic friction (the index is inverted on that component, so a noisy skeptic press lowers the score).
Trend chart populates as weeks accumulate.
GPT‑5 launch commentary spiked ‘phase change’ language; eval dispersion tempered the tails.
Mega-cap capex guides held elevated; vendor financing and GPU backlog stories stayed on page one.
Mainstream front pages ran AI explainers tied to GDP, jobs, and ‘sci-fi’ ledes—attention, not evidence.
Bearish equity notes and ROI skepticism were audible but not dominant; friction rose modestly vs the prior week.
DIGEST · SUNDAY, MAY 17
read full →- The New York Times — The New York Times
- The Economist — The Economist
- Protocol — Protocol
RECENT ESSAYS · 35 IN TOTAL
all essays →Hype vs Reality
GPT-5 release: capability deltas vs the narrative
Measured comparison of what shipped against the pre-release framing — and why the "phase transition" rhetoric mostly didn't survive contact with the benchmarks.
Model Comparisons
Reasoning models — o1 → o3 → DeepSeek R1 → Claude Opus 4.x thinking
What's actually new in the reasoning-model wave, where the capability ceilings sit, and which benchmarks are starting to get gamed.
Hype vs Reality
Agentic coding: Cursor, Devin, Claude Code, Replit Agent — adoption data vs marketing decks
Where the published adoption metrics actually land for each agentic coding product, and what gets quietly conflated when vendors talk "AI software engineer."
Industry & Investment
The DeepSeek pressure: have inference prices actually collapsed?
Three months after the price-war narrative crystallized, what's happened to enterprise inference economics — and what the frontier labs' price-card revisions actually reveal.
Technical Deep Dives
SWE-bench is broken: how coding evals get gamed and what replaces them
How the canonical agentic-coding benchmark is being optimized against, the Anthropic eval-paper findings, and what credible coding-eval looks like from 2026 onward.
Hype vs Reality
AI productivity papers: Goldman, MIT, BCG — what they actually show and don't
The three most-cited 2024-2026 papers on AI productivity contribution, the methodological caveats their summaries skip, and what would constitute durable productivity evidence.
COMPANIES · 10
Anthropic · Cohere · Cursor · DeepSeek · Google DeepMind · Meta AI (FAIR) · Mistral AI · OpenAI · Perplexity · xAI
MODELS · 19
Claude 3.5 Sonnet · Claude Mythos Preview · Claude Opus 4 (x) · Command R+ · DeepSeek R1 · DeepSeek V3 · DeepSeek V3.2 · DeepSeek V4-Pro · Gemini 2.x · Gemma 2 · GPT-4 · GPT-5 · GPT-5.5 · Grok 3 · Llama 3 · Llama 4 · Mistral Large · OpenAI o1 · OpenAI o3