5 Blind Prompt Tests to Pick an LLM Writing Assistant
Practical guide to evaluating LLMs for writing tasks, using LMArena’s Creative Writing Leaderboard and five hands-on tests.
Key Ideas
Five evaluation factors for writers choosing an LLM: creativity (instruction-following vs creative wildcard), voice matching, context window size, cost structure, and open-source vs proprietary. The five concrete tests: headline generation, drafting from outline, editing with voice preservation, research accuracy with citations, and voice matching from writing samples. As of January 2026, Claude Opus 4.5 and Gemini 3 Pro top creative writing rankings. Recommendation: Claude Opus 4.5 for premium creative work, Claude Sonnet 4.5 or Gemini 3 Pro for everyday content.
RDCO Mapping
The blind-test methodology is useful for any AI tool evaluation content. The voice-matching test framework could inform Sanity Check pieces on maintaining authenticity with AI writing tools. Current model rankings provide a snapshot of the competitive landscape.