Lessons from Building Claude Code: How Anthropic Uses Skills Internally
Summary
Thariq from the Claude Code team shares how Anthropic uses skills internally, with hundreds in active use. Skills cluster into nine recurring categories: (1) Library/API Reference, (2) Product Verification, (3) Data Fetching & Analysis, (4) Business Process & Team Automation, (5) Code Scaffolding & Templates, (6) Code Quality & Review, (7) CI/CD & Deployment, (8) Runbooks, and (9) Infrastructure Operations. The most powerful skills are not just text — they’re folders containing scripts, assets, and reference code that agents can discover and manipulate. Verification skills are singled out as the highest-ROI investment.
Why This Was Bookmarked
“how Anthropic thinks about using skills internally”
This is the source of truth on skill design from the team that built Claude Code. Directly informs how we structure 06-reference/concepts/skills-as-building-blocks and validates our operating model in SOUL.md.
Key Ideas
- Nine skill categories provide a comprehensive taxonomy for auditing skill coverage
- Verification skills are the highest ROI — worth dedicating an engineer for a week to make them excellent. Techniques include recording video of output and enforcing programmatic assertions
- Skills are folders, not files — include scripts, assets, data alongside markdown
- Configuration options matter — dynamic hooks, folder structure creativity, and composability are what separate good skills from great ones
- Business process skills benefit from log files of previous executions so the model stays consistent
- Runbook skills take symptoms (Slack threads, alerts, error signatures) and produce structured investigation reports
- Adversarial review skills spawn fresh-eyes subagents to critique, then iterate until findings degrade to nitpicks
Connections
This is the authoritative reference for our skill architecture. The nine-category taxonomy maps cleanly to what we’re building:
- Our standup/recap skills = Category 4 (Business Process)
- Our vault compilation = Category 5 (Scaffolding)
- Our inbox processing = Category 8 (Runbooks)
The verification emphasis connects to 06-reference/concepts/compounding-knowledge — skills that verify their own output compound faster because they catch drift early.
The data fetching category (funnel-query, cohort-compare, grafana) maps directly to 06-reference/concepts/analytics-as-craft and 01-projects/phdata/index consulting work.
Open Questions
- Are we missing any of the nine categories in our current skill library?
- Should we build verification skills for our vault compilation and inbox processing?
- How do we adopt the adversarial-review pattern for our own code quality workflows?