The Magic of Small Databases — Tom Critchlow
Summary
Critchlow imagines a “Substack for databases” — an easy tool for creating, maintaining, and publishing small datasets, with optional paywalling, email updates to subscribers, and built-in data cleanup via ML and web scraping. The core insight: publishing small, curated datasets can be market-making, just as Substack made newsletter publishing market-making for writers.
Relevance
This is the founding vision for 01-projects/data-marketplace/index. The value isn’t in big data — it’s in small, opinionated, well-maintained datasets that individuals and small teams publish and monetize. The “Substack for databases” framing is the cleanest pitch we’ve seen for the concept.
It also connects to 06-reference/concepts/compounding-knowledge — a curated dataset that gets updated over time compounds in value. Each addition makes the whole more useful. And the email-updates-to-subscribers model maps to 01-projects/newsletter/index audience-building patterns.
Open Questions
- What’s the minimum viable version of this for Ray Data Co? A single published dataset with a subscription layer?
- Which domains have the highest value small datasets? (Real estate, recruiting, niche industry data?)
- How does AI-assisted data cleanup change the economics? Can one person maintain datasets that previously required a team?