3Blue1Brown (Aleph 0 guest) — The AI that solved IMO Geometry Problems
Why this is in the vault
13-minute Aleph 0 guest video on 3Blue1Brown unpacking Google DeepMind’s January 2024 AlphaGeometry paper. The vault keeps it for two reasons that aren’t the headline. (1) The non-AI baseline did most of the work — a 25-year-old purely-symbolic deductive database (Chu Gao & Jang, October 2000) plus an algebraic-reasoning module plus 75 hand-coded heuristics already solved 18 of 30 IMO problems before any neural network entered the loop; the language model’s only job in the final hybrid is to propose auxiliary constructions (the creative leap that opens search space). This is the cleanest single-source case in the vault for the editorial discipline of always foregrounding the symbolic baseline before crediting the LM. (2) The neuro-symbolic architecture (LM proposes, deterministic engine disposes) is the load-bearing pattern for RDCO’s own skill ecology — every skill should be an LM-creative-brain wrapped by a deterministic verifier, with the AlphaGeometry → DD+AR loop as the canonical reference design. The synthetic-training-data trick (reverse-running the symbolic solver to generate hundreds of millions of proofs) is a transferable technique for any RDCO domain where labeled data is scarce.
Episode summary
13-minute Aleph 0 (Aditya Chakravarti) guest video on Grant Sanderson’s 3Blue1Brown summer-2025 guest series. Walks through Google DeepMind’s January 2024 AlphaGeometry paper: an AI system that solved 25 of 30 IMO-level geometry problems (silver-medalist territory). The load-bearing argument is not “AI solved the IMO” — it’s that a 25-year-old purely-symbolic technique (DD + AR) already got 18 of 30 right with no AI at all, and that the AI’s only job in the final hybrid is to propose auxiliary constructions (the creative leap that opens search space for the symbolic solver). Closes on the broader thesis: the right architecture for hard reasoning is creativity (LM) + logic (symbolic), not either alone.
Key arguments / segments
- [00:00:00] The framing flip. AlphaGeometry’s headline was “AI solves IMO.” The under-told story is what the non-AI baseline already achieved: 18/30 with pure logic + algebra + 75 hand-coded heuristics, before any neural network entered the loop.
- [00:01:30] The two foundational geometry facts. Opposite angles equal at a crossing; alternate angles equal across parallel lines (the “Z” rule). With just these two facts, you can prove non-trivial theorems by chaining applications. Now imagine starting with 75 facts.
- [00:02:30] DD — Deductive Database (Chu Gao & Jang, October 2000). A 75-rule database of geometric facts; the algorithm chains rules to deduce new theorems from a problem statement. Encodes problems in a niche geometry-specific language (Lean is great for algebra/inequalities but not Euclidean geometry).
- [00:04:15] DD baseline result: 7 of 30 IMO problems. Brute-force-only — not even an honorable mention, but the fact that any IMO problem yields to brute force is itself surprising.
- [00:05:00] DD’s hard limit: it can’t solve equations. Walks through Thales’s theorem (the angle in a semicircle is always 90°) — provable only by setting up
2α + 2β = 180°and solving. DD has no machinery for this. - [00:05:45] AR — Algebraic Reasoning. Adds a linear-algebra solver. DeepMind alternates DD → AR → DD → AR until the problem is solved or time runs out. Result: DD + AR solves 14 of 30. Adding “human-coded heuristics” pushes it to 18 of 30 (near bronze medal). All without AI.
- [00:07:00] The fundamental weakness: auxiliary constructions. Many hard geometry proofs require adding lines or shapes that aren’t in the diagram. Example: proving angles of a triangle sum to 180° elegantly requires drawing two parallel lines through the apex. The search space of “what to add” is infinite at every step. This is where AI enters.
- [00:08:30] The DeepMind hybrid: language model proposes, DD+AR disposes. A trained LM whose only job is to output auxiliary constructions in the geometry coding language. Loop: LM proposes a construction → DD+AR runs to exhaustion → output fed back to LM → next construction → repeat. The LM is the “creative brain”; DD+AR is the “logical brain.”
- [00:10:00] The synthetic-training-data trick. There aren’t enough IMO geometry problems online to train an LM. So they generated hundreds of millions of synthetic proofs by: (1) randomly plotting points/lines, (2) running DD+AR forward to derive everything provable, (3) erasing portions of the diagram backward. The erased portions become the auxiliary constructions the LM has to learn to re-propose. 9 million required at least one auxiliary construction; the most complex synthetic proof was 247 steps with 2 auxiliary constructions.
- [00:11:30] Final result: DD + AR + LM = 25 of 30 problems (silver medal performance), up from 18/30 without the LM.
- [00:12:30] The closing claim. The interesting part isn’t AI doing geometry — it’s the demonstration that creativity + logic together is a transferable architecture for hard reasoning across science, medicine, engineering. Geometry is the testbed; the architecture generalizes.
Notable claims
- [00:00:30] AlphaGeometry solved 25 of 30 IMO geometry problems (Jan 2024) — silver-medalist performance. Better than the vast majority of Olympiad participants.
- [00:04:30] DD alone solves 7 of 30 — first time any brute-force method solved IMO geometry problems at all.
- [00:06:30] DD + AR solves 14 of 30, and DD + AR + 75 human-coded heuristics solves 18 of 30 (near bronze medal) — all without any neural network.
- [00:11:00] DeepMind generated hundreds of millions of synthetic geometry proofs, with 9 million requiring at least one auxiliary construction, to train the LM with no scarce real-world data.
- [00:12:00] The longest synthetic proof: 247 steps, 2 auxiliary constructions — gives a sense of the scale of geometric reasoning AlphaGeometry’s LM was trained against.
- The general architecture (LM = creative proposer, symbolic engine = logical verifier) is presented as the canonical neuro-symbolic loop, with explicit framing as “creative brain + logical brain.”
Guests
- Aditya Chakravarti (Aleph 0) — runs the Aleph 0 YouTube channel. Higher-mathematics explainer with a “bite-sized but substantive” format. Second of four guest videos in 3Blue1Brown’s summer 2025 guest series; Sanderson explicitly endorses the channel in the closing.
- Grant Sanderson (3Blue1Brown) — channel host; appears only at the end to introduce the guest and tee up the next three guest videos in the series.
Mapping against Ray Data Co
- The DD-baseline-first framing is the right rhetorical pattern for any RDCO Sanity Check piece on AI capability. The lazy framing is “AI did X.” The load-bearing framing is “non-AI baseline already did X-1; AI closed the last gap.” That structure resists hype, gives credit to decades of symbolic / classical work, and clarifies what the LM actually contributes. Worth filing as a structural template for any RDCO piece evaluating an “AI did Y” headline. Concrete future use: when AlphaProof or successor models hit benchmarks, lead with “what did the symbolic baseline already do?”
- Neuro-symbolic hybrid (LM proposes, verifier disposes) is the right architecture pattern for RDCO’s own agent stack. Every RDCO skill is currently end-to-end LLM. The AlphaGeometry pattern suggests a stronger architecture: LM generates a candidate (filename, schema, query plan, transcription), then a deterministic verifier checks it (regex, schema validator, SQL planner, audit script). This is exactly what the new
audit-newsletter-outputs.pyscript does for newsletter ingestion — verifier downstream of LM. Worth lifting to a vault concept doc on LLM-as-creative-brain + verifier-as-logical-brain as the canonical RDCO skill pattern, with AlphaGeometry as the load-bearing example. - Synthetic-training-data generation by reverse-running a deterministic engine is a transferable trick for RDCO. DeepMind didn’t have enough IMO problems, so they ran the symbolic solver forward then erased portions of the diagram. RDCO faces the same “not enough founder-voice training data” problem for any voice-tuned LM work. The reverse-trick equivalent: take published Sanity Check pieces, run them through a structural-decomposer (extract hook → tangible-vs-abstract markers → CopyThat patterns → conclusion), then erase those structural markers. Train a generator to re-propose them. This is a research-skill build, not a near-term ship, but worth filing as a candidate technique.
- The “auxiliary construction = creative leap” framing maps directly to /research-brief and /curiosity skill design. What those skills do is propose the non-obvious entity (an unexpected source, an unexpected angle, a contrarian take) that lets the deterministic downstream skill (/draft-review, /voice-match) work. The auxiliary construction is the right metaphor for what creative LM reasoning adds on top of a deterministic pipeline. Worth a one-line note in
~/.claude/skills/research-brief/SKILL.mdframing the skill’s job as “propose auxiliary constructions for the editorial pipeline.” - Aleph 0 should go on the YouTube watch list. The skill’s
youtube-channels.jsonregistry currently does not include Aleph 0. Sanderson endorsing the channel as a high-quality math explainer is strong third-party validation. Worth adding as Tier 2 (watch-only, no backfill) since the founder’s interest is in the cross-domain reasoning / math-pedagogy axis. - The 25-out-of-30 number deserves contextualization in any RDCO writing. AlphaGeometry was tested on a 30-problem set, not on every IMO geometry problem ever. Any Sanity Check piece referencing it should clarify the test set size and the benchmark baseline (silver medalist threshold) so the audience doesn’t hear “AI solved IMO geometry” as “AI solved every IMO geometry problem.”
Open follow-ups
- Add Aleph 0 to
~/.claude/state/youtube-channels.jsonas Tier 2. Channel ID lookup needed viayt-dlp --print "%(channel_id)s". ~10 min. - Vault concept doc: “LLM-as-creative-brain + deterministic-verifier-as-logical-brain.” AlphaGeometry as the load-bearing example. Map RDCO skills onto the pattern:
/process-youtube(LM extracts → audit script verifies file paths/format),/process-newsletter(LM assesses → audit verifies frontmatter),/check-board(LM plans → Notion API validates state). Highlight skills that don’t yet have a verifier and queue verifier-build tasks. ~1 hour. - Sanity Check angle: “The AI did it. The non-AI baseline did 70% of it.” Lead with the DD+AR+heuristics 18/30 baseline. Pivot to the pattern across other “AI breakthrough” stories (AlphaFold had decades of structural-bio prior work; AlphaCode rests on classical compiler/parser infrastructure). Land on the editorial discipline: always foreground the non-AI baseline so the audience sees what the LM actually added. ~1500 words.
- Concept page candidate: “Synthetic training data by reverse-running a verifier.” Needs 2 more sources (likely candidates: an AlphaCode synthetic-data write-up, and a self-play RL paper from DeepMind). Track in CANDIDATES.md as below-threshold.
Related
- ~/rdco-vault/06-reference/transcripts/2026-04-20-3blue1brown-imo-geometry-alphageometry-aleph0-transcript.md — full transcript
- ~/rdco-vault/06-reference/2026-04-19-3blue1brown-large-language-models-explained-briefly — Sanderson’s own LLM intro; pairs with this on the “what LMs actually do” axis
- ~/rdco-vault/06-reference/concepts/CANDIDATES.md — track the LM-creative-brain + verifier-logical-brain candidate