Is OSS the Future of AI? — Tristan Handy

Summary

Tristan Handy (dbt Labs founder) laid out the seven open questions defining the AI landscape in mid-2023. Written at the inflection point when smaller open models began closing the gap on massive proprietary ones, the piece frames a set of tensions that remain structurally relevant even as specific models have changed:

Scale vs. iteration speed. Do hundreds-of-billions-parameter models necessarily win, or does their size slow experimentation so much that smaller models with faster iteration cycles catch up?
Fine-tuning vs. foundation model quality. How much does the base model matter relative to domain-specific fine-tuning? If fine-tuning is king, open models win because the community iterates faster.
Open vs. closed. If smaller + fine-tuned + community-driven outperforms massive + closed, OSS has a structural advantage. (By 2026, this has partially played out with models like Llama, Mistral, and DeepSeek.)
Proprietary datasets as moats. Will Google/Meta’s data advantages translate to AI dominance, or will high-value AI use cases depend on different, domain-specific datasets?
Regulability of open-source AI. If cutting-edge AI is small customized layers traded via open communities, regulation becomes nearly impossible to enforce.
International competitiveness. Can any country regulate proprietary models without ceding ground to less-regulated competitors?
Predictable societal effects. What near-term harms can we confidently predict, and are there realistic mitigations?

The dataset question (#4) is the one most relevant to 01-projects/data-marketplace/index — the thesis that domain-specific, curated datasets have outsized value for AI fine-tuning and retrieval is essentially the data marketplace bet. See also 06-reference/2026-04-03-magic-of-small-databases on the “Substack for databases” concept.

The open-vs-closed tension maps to SOUL.md’s preference for composable, modular tools over monolithic platforms — the same logic that favors open models favors an agent architecture built from interchangeable parts (06-reference/concepts/skills-as-building-blocks).

Open Questions

Three years later, the answer to questions 1-3 appears to be “both” — open models are competitive but frontier closed models still lead on reasoning. Has the question shifted from “open vs. closed” to “which layer of the stack benefits most from being open”?
How does the fine-tuning vs. foundation model question apply to the agent architecture? Are we fine-tuning or mostly prompting?
What proprietary datasets would be most valuable for 01-projects/phdata/index clients to build AI products on top of?