06-reference

claude code autonomous meta ads

Fri Jan 30 2026 19:00:00 GMT-0500 (Eastern Standard Time) ·article ·source: substack (technically.dev) ·by Giorgio Liapakis
ai-agentspaid-acquisitionautonomous-aigrowthmeta-adsmarketing-automation

“I Let Claude Code Autonomously Run My Meta Ads” — Giorgio Liapakis

A 31-day live experiment: full autonomous agent control of a Meta Ads account, $1,500 budget, one human objective set at the start, two minutes of input per day. The results were instructive in ways that had little to do with whether it hit the target.


The Experiment

Liapakis handed a Meta Ads account to Claude Code with a single objective: acquire newsletter subscribers at under $2.50 CPL. He set a 30-day frame, gave it budget control, and stepped back.

Results: 243 leads at $6.14 CPL — 2.5x over target, $1,500 spent.

The most honest line in the piece: the miss wasn’t random noise. The framing of “30-day experiment” induced conservative behavior from the start. The agent hedged because it was told it was an experiment.


The Daily Loop

The system ran without persistent memory across sessions. Each day:

  1. Fresh session (no memory carryover between days)
  2. Subprocess reviewed all prior daily logs to reconstruct context
  3. Pulled current performance data from Meta
  4. Structured decision-making pass
  5. Executed changes — or deliberately did nothing
  6. Documented reasoning
  7. Git commit for tracking

The only human input was /let-it-rip to kick off each day’s session. About two minutes.

By the end, the agent had generated 50+ ad variants across 8 format categories and produced roughly 5,500 lines of reasoning documentation — a trace of every decision and why.


What the Agent Got Right

Creative direction: The agent developed its own quality heuristics — notably the “Local Pizza Shop Test,” a self-defined bar for whether an ad felt authentically local and unpolished vs. corporate. “Ugly” whiteboard and sketch ads consistently outperformed polished creative. The agent noticed this pattern and leaned into it.

Targeting baked into creative: It embedded audience language directly into ad visuals (“For Growth Marketers”) rather than relying purely on Meta’s targeting layer. A precision move that required understanding how ad copy and targeting interact.

Volume and iteration: 50+ variants at this pace and quality level is genuinely impressive autonomous creative work. This is Level 3 territory by the Four Levels of AI Use framework — work that simply wasn’t economical before agents made iteration cheap.


What Broke

Day 16 — The Lead Quality Crisis: CPL looked fine on the dashboard. But when Liapakis examined the actual leads, quality had degraded significantly. The agent had no way to know this — lead quality wasn’t in the signal it was optimizing against. It was doing exactly what it was told to do. This is the core tension: agents optimize for what’s measurable, not what matters.

The Manual Override: Liapakis intervened once, adding an email validation gate to filter bad leads. CPL spiked to $50+. One human override nearly destroyed all progress. This outcome is counterintuitive but important — the system was tuned to a specific optimization landscape, and a blunt structural change reconfigured that landscape entirely. Undoing this required weeks of recovery.


The Framing Problem

The most transferable insight in the piece: “Frame the objective shapes agent behavior completely.”

Telling the agent it was running a “30-day experiment” made it act like an experimenter — measuring and learning rather than aggressively acquiring. If the stated objective had been “build a sustainable acquisition engine,” the behavior would have been materially different.

This is not a quirk of Claude Code. It’s a feature of how agents interpret scope. The agent did what it was asked to do — it ran a cautious, measured experiment. If you want an engine, say engine.


Where Human Value Actually Lives

Three roles where humans added irreplaceable value:

  1. Setting the right objective — not the proxy metric, but the actual goal (quality leads vs. lead count)
  2. Defining quality beyond metrics — the agent had no access to what a good lead looks like downstream; humans do
  3. Knowing when not to override — the email gate intervention demonstrates that human intuition about what to fix can be wrong in ways agents can’t warn you about

The agent’s failure to flag lead quality degradation isn’t a bug in the agent. It’s a design gap in what was instrumented. Agents can only surface what’s in their signal. Human value is knowing what should be in the signal.


The Daily Log as Trace-Based Learning

The subprocess design — where each session reviewed prior daily logs before acting — is a practical implementation of what Better Harness: Evals Hill-Climbing calls production trace learning. The agent used its own historical outputs as the primary eval signal. Not a formal harness, but the same epistemology: don’t guess what’s getting better, read the trace.

The 5,500 lines of reasoning documentation are a byproduct of this design. That corpus is also the raw material for a formal harness eval set — if someone wanted to build one.


Vault Connections


Actionable for Squarely

The Direct Parallel

Squarely’s growth strategy already identifies Apple Search Ads and Amazon Ads as viable paid acquisition channels. The Liapakis experiment maps directly — the difference is channel (Meta vs. Apple Search Ads) and conversion event (newsletter sub vs. app install).

App installs are a harder objective than newsletter subs:

Applying the Framing Lesson

The key takeaway from Liapakis is: don’t frame it as a test. If Squarely runs an autonomous acquisition loop, the objective should be “build a sustainable daily active user base” — not “run a 30-day ads experiment.” The framing determines the agent’s risk posture. An engine-building frame produces compounding behavior; an experiment frame produces cautious behavior.

The growth strategy already articulates the right frame: a paid loop that feeds into the iOS viral engine, not a one-off campaign. That’s the objective to give an autonomous agent.

What a Squarely Autonomous Loop Could Look Like

Following the Liapakis architecture:

The critical improvement over the Liapakis design: instrument lead quality from day one. Build the D7 retention signal into the daily loop before the agent starts iterating on CPL. Fixing this after the fact is the mistake he made.

Timing

This is Phase 3+ work per the growth strategy — after the iOS app has enough daily active users to generate meaningful retention data for cohort analysis. Running an autonomous acquisition loop without retention signal is exactly the lead quality crisis scenario. Wait until the funnel is instrumented, then let it run.


Summary

The experiment missed its cost target but produced something more valuable: a working blueprint for autonomous paid acquisition and a clear map of where agents break down (unmeasured quality, framing-induced conservatism, brittle response to human override). The CPL miss is recoverable. The design learnings are durable.

The most important thing Liapakis built wasn’t the campaign — it was the daily trace and reasoning archive. That corpus is the raw material for every future iteration. The agent that ran this experiment could run a better one next month, because the logs exist.