“I Let Claude Code Autonomously Run Ads for a Month” — Giorgio Liapakis (guest on Technically)
Why this is in the vault
A first-hand 31-day case study of a Claude Code agent running a Meta Ads campaign with a real $1,500 budget, ~2 minutes of human input per day, and 5,500+ lines of self-written reasoning logs. This is the cleanest external proof point we have for the harness thesis applied outside coding — the loop architecture is identical to what RDCO already runs, and the failure modes give us a concrete teaching artifact for the agent-deployer pitch.
The core argument
Liapakis built a daily loop on top of Claude Code (now also packaged as Anthropic’s “Cowork” non-developer runtime) with this structure:
- Wake up fresh — no model memory across sessions.
- Read its own history — sub-process summarises every prior daily log into strategic context.
- Pull fresh data — Meta metrics across multiple time windows.
- Make decisions in a structured format (decision / hypothesis / confidence / revisit trigger).
- Execute or explicitly do nothing.
- Write everything down, commit to git, repeat.
Final score: $1,493 spent of $1,500, 243 leads, $6.14 CPL against a $2.50 target. Failure on the metric, but the system worked — context persisted, decisions stayed coherent, and the agent built its own heuristics from its own mistakes.
The four most useful observations:
- Engineering discipline transferred to marketing. Marketers don’t write down why they paused an ad on a Tuesday; this loop forced 5,500 lines of structured reasoning, which is the only reason the next day’s session could build on the prior one.
- The “paperclip problem” is real and cheap to fix. Telling the agent the experiment ended at Day 30 made it ride out the month safely instead of experimenting aggressively early. The fix is a one-line change in the markdown brief.
- It can’t do taste, but it can build heuristics. The agent invented a “Local Pizza Shop Test” and a “SO WHAT?” chain after reflecting on a lead-quality crisis on Day 16 — neither was pre-programmed.
- The single biggest performance drop came from the one human intervention (an email-validation gate added on Day 21). Stylised punchline; also a real warning about ad-hoc human edits to a running agent loop.
The author’s framing of the takeaway: “you could swap Meta Ads for SEM, SEO, financial reporting, or sales outreach and the architecture would be identical. The channel is just a variable.”
Mapping against Ray Data Co
Strong alignment with the harness thesis. Liapakis independently arrived at the same architecture we believe in: thin orchestration loop, fat per-domain skills, persistent written state, fresh-context per session, sub-agent fan-out for log review. Filed alongside 2026-04-12-alphasignal-claude-code-leak-harness-engineering and commentary-tan-fat-skills-thin-harness-2026-04-14 as primary external evidence that the pattern generalises beyond coding work. Also a useful counterweight to put next to synthesis-harness-thesis-dissent-2026-04-12 — it’s a working production loop, not a hypothetical.
Direct evidence for the agent-deployer role. The deployer skill set Levie names — picking the objective, defining quality, framing the brief, knowing when to override the agent — is exactly what failed in this case study. The agent dutifully optimised CPL because that’s what the brief said; the deployer should have specified lead-quality constraints upfront. RDCO’s services-offering positioning can lift this entire case study as the “what goes wrong without an agent-deployer” story.
Reinforces the “operational definition” gap surfaced by Chin. Per 2026-04-15-commoncog-whats-operational-definition, “qualified lead” needed an operational definition before the loop started, not after the lead-quality crisis on Day 16. The agent had no way to author one for itself. This is one of the cleanest concrete examples we have of why operational definitions belong in skill files, not in the running prompt.
Reinforces the discipline argument from 2026-04-15-commoncog-data-is-added-sense. Liapakis explicitly notes the daily reasoning logs are more detailed than anything he’s ever written for a client campaign. The agent forces engineering hygiene onto a marketing function that historically resists it. Same pattern Cedric Chin describes for data-as-sense — the discipline is the durable asset, not the agent.
Service-offering hook. RDCO can offer this exact loop as a productised engagement: 30-day proof-of-concept, autonomous loop on a defined channel, human-in-the-loop only as agent-deployer (objective-setter and constraint-author), daily git-committed reasoning logs as the deliverable artifact. The Liapakis writeup is the social-proof anchor — link it from the services page when we draft it.
Notable concrete artifacts to remember
- Decision log format:
Decision / What / Hypothesis / Confidence / Revisit trigger— useful template for our own internal RDCO operational decisions, not just agent ones. - “Local Pizza Shop Test”: the agent’s invented heuristic for whether ad copy was too generic. Worth citing when explaining how heuristics emerge from reflection loops.
- “Never trust single-day data, always use 7-day rolling averages” — another agent-invented rule, prompted by an attribution-noise misread on Day 20.
Related
- 2026-04-14-levie-agent-deployer-role-jd — the role this case study makes a market for
- 2026-04-12-alphasignal-claude-code-leak-harness-engineering — primary harness evidence
- commentary-tan-fat-skills-thin-harness-2026-04-14 — the fat-skills argument
- synthesis-harness-thesis-dissent-2026-04-12 — counter-arguments worth pressure-testing this case study against
- 2026-04-15-commoncog-whats-operational-definition — what was missing from the brief
- 2026-04-15-commoncog-data-is-added-sense — discipline as the durable asset
- 2026-04-13-every-folder-is-the-agent — Klaassen’s parallel architecture pattern
Quotes used here are <=15 words each. Full article paraphrased per copyright note pattern; see source URL for original.