01-projects / mammoth-growth

jeff agentic velocity quantification draft

Sun Apr 26 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·draft ·status: draft-for-founder-review ·source: founder raw notes (2026-04-27 iMessage 10:46 EDT)

Draft response to Jeff — agentic velocity, year-over-year

Editor’s note (for Ben, not Jeff)

This rolls your raw notes into a structure that opens with the executive snapshot Jeff asked for, then stages the supporting story below it so he can read down as deep as his attention allows. Keeps your voice (candid, technically grounded, Jevon’s Paradox closer). Two judgment calls:

  1. The IDD framework is the spine. Your four metrics (size up / attempts down / streak up / presence down) and the agentic-path ladder do all the structural work — both at exec level and in the body. Crediting IndyDevDan up front lets Jeff find the originals if he wants.
  2. Concrete “X took Y, today Z” needs more candor than your notes give it. I padded with what I think are honest bracketed estimates from MG’s recent work (Push for Progress pace, LegalShield POC, NF taxonomy cutover). Bracketed numbers are placeholders — please verify before sending. The honest version is “we couldn’t have bid this 12 months ago” rather than pretending we have stopwatch numbers.

Total length is long but well-fenced. The TL;DR is ~200 words; everything below is optional depth.


The actual draft (paste from here down)

Hi Jeff,

Quick frame before the sharing more lengthy thoughts… I’m using IndyDevDan’s two pieces here as the spine. His four metrics for an agentic workflow:

TL;DR — quantified delta

MetricApr 2025Nov 2025Feb 2026Apr 2026
Size of task handed to agentTiny — single function or code blockMedium — whole tech specLarge — full SDLC sliceLarge + railed (loop-driven)
Attempts to complete a unit5+ (no templates, no scale)3+ (better prompts, first MG distribution)2 (reset paradox in play)2, with self-evaluation tightening
Streak (one-shot steps without intervention)113+3+ with continuous polling
Presence (engineer attention required)High — alongside the agentHighMedium — kickoff & wait, which is flow state purgatoryMedium-Small, trending toward “off-machine”
Stage on the IDD pathBaseBetterCustom + OrchestratorOrchestrator + experimental Loop

Three things that didn’t exist a year ago and now define how MG ships: mg-cc / cc-wrapped (a shared agentic harness — the first time we scaled and standardized this kind of work across engineers, vs. each engineer hand-rolling their own thing); project-docs (modular long-term memory so a session can pick up where another left off); and ae-sandbox dev schemas (the missing piece that let us actually parallelize analytics work inside one codebase, not just across clients). Without those three, none of the orchestration above works.

I feel Jevon’s Paradox is truly at play, not just a marketing sound bite. We got 5-10× faster on the work we used to do, and we’ve never been busier. We’re now doing work that was uneconomical 12 months ago — multi-turn audits, always-fresh documentation, TDD specs for analytics models, scheduled tech-debt paydown. The frontier moved.


The longer story (read as deep as you want)

How we got here — the path mapped to MG repos.

A year ago (April 2025) we were at Base: Copilot-style autocomplete, maybe a one-off prompt to scaffold a bronze model. The codegen dbt package could do similar. No standardization. Each engineer’s agentic depth was exactly equal to their personal initiative.

mg-cc was created July 28, 2025. cc-wrapped was created January 4, 2026. Those are the two repos that moved us off Base. They gave the team the same prompt scaffolding, the same skills, the same context-engineering patterns — which is what Better actually means. It’s not that the model got smarter between April and November; it’s that the harness got distributable.

More (parallelizing across multiple sessions) was a persistent challenge in analytics specifically. You can open more terminals. You can use git worktrees. But analytics doesn’t actually parallelize on those alone — you need a sandboxed data layer too, or two agents step on each other inside the same warehouse. We didn’t unlock that until PR #18 in cc-wrapped when we wired in dbt clone + custom dev schemas. Worth noting we’d had a poor-man’s version of “More” for years just by having multiple clients — NF + PRG + TVS work could run in parallel because they were different codebases and different Snowflake accounts.

Custom and Orchestrator came on top of one critical invention: project-docs. A modular knowledge base that attaches to the agentic layer. Before project-docs, every task had to fit in one session and one context window — which capped task size hard and kept presence high because the engineer was the persistent memory. Project-docs gave us long-term memory: a new session can be primed with what was already done and where it should pick up. That’s also what unlocked streak above 1 — agent A finishes their part, writes a context bundle, and agent B picks it up in a fresh session with zero human handoff.

The custom skills then composed into orchestration. Quick tour:

Composability of those skills is what let us build the orchestrators:

Currently (April 2026) we’re experimenting with /loop for continuous polling — feed Jira tickets and GitHub PRs into a continuously-running agent and it picks up work from the queue and runs it through the railed workflows. That’s what we presented to NF (image attached separately).

Q2 2026 and beyond — the two bottlenecks I’d highlight:

  1. Targeting / done-signal. The agent needs to know when it has done a good job. /dq is one stab at this for analytics. The general problem is the next big lift — once we crack it, audit time drops dramatically and the streak metric goes way up.
  2. Presence / off-machine deployment. Right now the agentic workflow still lives on the engineer’s laptop. To get presence to actually drop to zero, we need workflows running off-laptop. OpenClaw demonstrated the shape; the industry hasn’t settled on the safe-and-enterprise version yet. I run “Claude Code Channels” as a personal Claw, but the permissions/guardrails are hand-rolled and not enterprise-ready.

Concrete cases (best examples I have)

A few “12 months ago vs today” comparisons. Bracketed numbers are estimates — I don’t have stopwatch data, but the orders of magnitude are right:

That’s the version of the question I’d actually pitch — the most-honest framing isn’t “X used to take 5 days, now it takes 1 hour.” It’s “X used to be uneconomical to do at all, or we’d have had to cut audit corners to fit the timeline. We now do X routinely, with the audit, and the timeline is paced by business acceptance rather than engineering.”


Why we feel busier (the Jevon’s Paradox bit)

We made the work cheaper. We did not get less of it. We got more of it, at a higher floor.

Things that previously sat in the “uneconomical” bucket:

All of those are now within reach of a single skill or two. So we do them. And every one of them adds work to the queue while raising the standard.

The new practitioner challenge isn’t “can the agent do it” — it’s maintaining comprehension of all the work the agent is delivering. Knowing what’s happening across multiple parallel sessions. Getting paged at the right moment when the agent needs you and not the wrong one. The operating-system problem of running 3-4 agentic threads simultaneously and staying useful in each one.

That’s where I think MG’s next moat is, honestly. Tool-makers ship the agents. The teams that win are the ones that figure out the human operating model around them.

— Ben


Notes for Ben (post-draft)

A few editing flags worth a second pass before this goes to Jeff:

  1. Bracketed estimates need verification. The Push for Progress / LegalShield / NF taxonomy numbers are my padding — please replace with real numbers if you have them, or hedge harder. Jeff is going to lift this for marketing materials so the numbers should survive scrutiny.

  2. The IDD attribution. I credited him explicitly twice. Strip if Jeff already knows, or leave if you think it gives Jeff something to forward to whoever asked him. Up to you.

  3. The /dq + targeting-system thread. This is genuinely the most important thing in the whole letter for MG’s roadmap and I gave it one paragraph. If you want this to be the load-bearing closer instead of the Jevon’s Paradox one, swap them.

  4. OpenClaw / Claude Code Channels mention. Risk Jeff doesn’t know either reference. Add a one-liner (“an experimental project that ran agents off-machine via a custom infra layer”) if needed.

  5. Length. This is ~1,400 words. If Jeff wants the marketing-paste version it’s the table + the Jevon’s Paradox kicker. If he wants the internal-roadmap version it’s the whole thing. Send both halves and let him choose.