Draft response to Jeff — agentic velocity, year-over-year
Editor’s note (for Ben, not Jeff)
This rolls your raw notes into a structure that opens with the executive snapshot Jeff asked for, then stages the supporting story below it so he can read down as deep as his attention allows. Keeps your voice (candid, technically grounded, Jevon’s Paradox closer). Two judgment calls:
- The IDD framework is the spine. Your four metrics (size up / attempts down / streak up / presence down) and the agentic-path ladder do all the structural work — both at exec level and in the body. Crediting IndyDevDan up front lets Jeff find the originals if he wants.
- Concrete “X took Y, today Z” needs more candor than your notes give it. I padded with what I think are honest bracketed estimates from MG’s recent work (Push for Progress pace, LegalShield POC, NF taxonomy cutover). Bracketed numbers are placeholders — please verify before sending. The honest version is “we couldn’t have bid this 12 months ago” rather than pretending we have stopwatch numbers.
Total length is long but well-fenced. The TL;DR is ~200 words; everything below is optional depth.
The actual draft (paste from here down)
Hi Jeff,
Quick frame before the sharing more lengthy thoughts… I’m using IndyDevDan’s two pieces here as the spine. His four metrics for an agentic workflow:
- *size (going up)
- attempts (going down)
- streak (going up)
- presence (going down)
As well as his agentic-path ladder: Base → Better → More → Custom → Orchestrator.
TL;DR — quantified delta
| Metric | Apr 2025 | Nov 2025 | Feb 2026 | Apr 2026 |
|---|---|---|---|---|
| Size of task handed to agent | Tiny — single function or code block | Medium — whole tech spec | Large — full SDLC slice | Large + railed (loop-driven) |
| Attempts to complete a unit | 5+ (no templates, no scale) | 3+ (better prompts, first MG distribution) | 2 (reset paradox in play) | 2, with self-evaluation tightening |
| Streak (one-shot steps without intervention) | 1 | 1 | 3+ | 3+ with continuous polling |
| Presence (engineer attention required) | High — alongside the agent | High | Medium — kickoff & wait, which is flow state purgatory | Medium-Small, trending toward “off-machine” |
| Stage on the IDD path | Base | Better | Custom + Orchestrator | Orchestrator + experimental Loop |
Three things that didn’t exist a year ago and now define how MG ships:
mg-cc / cc-wrapped (a shared agentic harness — the first time we
scaled and standardized this kind of work across engineers, vs. each engineer
hand-rolling their own thing); project-docs (modular long-term memory so a
session can pick up where another left off); and ae-sandbox dev
schemas (the missing piece that let us actually parallelize analytics work
inside one codebase, not just across clients). Without those three, none of
the orchestration above works.
I feel Jevon’s Paradox is truly at play, not just a marketing sound bite. We got 5-10× faster on the work we used to do, and we’ve never been busier. We’re now doing work that was uneconomical 12 months ago — multi-turn audits, always-fresh documentation, TDD specs for analytics models, scheduled tech-debt paydown. The frontier moved.
The longer story (read as deep as you want)
How we got here — the path mapped to MG repos.
A year ago (April 2025) we were at Base: Copilot-style autocomplete,
maybe a one-off prompt to scaffold a bronze model. The codegen dbt package
could do similar. No standardization. Each engineer’s agentic depth was
exactly equal to their personal initiative.
mg-cc was created July 28, 2025. cc-wrapped was created January 4, 2026.
Those are the two repos that moved us off Base. They gave the team the same
prompt scaffolding, the same skills, the same context-engineering patterns —
which is what Better actually means. It’s not that the model got smarter
between April and November; it’s that the harness got distributable.
More (parallelizing across multiple sessions) was a persistent challenge
in analytics specifically. You can open more terminals. You can use git
worktrees. But analytics doesn’t actually parallelize on those alone — you
need a sandboxed data layer too, or two agents step on each other inside the
same warehouse. We didn’t unlock that until PR #18 in cc-wrapped when
we wired in dbt clone + custom dev schemas. Worth noting we’d had a
poor-man’s version of “More” for years just by having multiple clients —
NF + PRG + TVS work could run in parallel because they were different
codebases and different Snowflake accounts.
Custom and Orchestrator came on top of one critical invention: project-docs. A modular knowledge base that attaches to the agentic layer. Before project-docs, every task had to fit in one session and one context window — which capped task size hard and kept presence high because the engineer was the persistent memory. Project-docs gave us long-term memory: a new session can be primed with what was already done and where it should pick up. That’s also what unlocked streak above 1 — agent A finishes their part, writes a context bundle, and agent B picks it up in a fresh session with zero human handoff.
The custom skills then composed into orchestration. Quick tour:
/fathompulls meeting transcripts and notes into the context bundle alongside the code./propens pull requests to MG’s standard, with documentation that’s more consistent than any of us write by hand./review-pris more thorough than a senior engineer’s manual review and runs in roughly 5 minutes vs the 30-60 a senior would spend. The democratization is the bigger deal than the speed: every engineer can pre-review their own work before pinging a peer, which kills a whole iteration loop./brd,/plan,/acturn requirements → tech spec → acceptance criteria./google-workspacelets the agent read/write GDocs/Sheets/Slides to brand standards (this single skill killed a major handoff bottleneck — formatting was killing us at delivery).
Composability of those skills is what let us build the orchestrators:
/triage-fathom— meeting → engineer context bundles + Jira tickets, no human in the loop./workflow— full SDLC from requirements to PR. Feed the same bundle back in any session and it picks up where the lifecycle left off./dq— TDD for analytics. Define test plan → instrument → run tests → triage failures → green status. This is the one I’m watching closest: it gives the agent a real “done” signal, which is what’s currently crushing delivery speed (QA/audit on the back end is killing us when “done” is fuzzy).
Currently (April 2026) we’re experimenting with /loop for continuous
polling — feed Jira tickets and GitHub PRs into a continuously-running agent
and it picks up work from the queue and runs it through the railed
workflows. That’s what we presented to NF (image attached separately).
Q2 2026 and beyond — the two bottlenecks I’d highlight:
- Targeting / done-signal. The agent needs to know when it has done a
good job.
/dqis one stab at this for analytics. The general problem is the next big lift — once we crack it, audit time drops dramatically and the streak metric goes way up. - Presence / off-machine deployment. Right now the agentic workflow still lives on the engineer’s laptop. To get presence to actually drop to zero, we need workflows running off-laptop. OpenClaw demonstrated the shape; the industry hasn’t settled on the safe-and-enterprise version yet. I run “Claude Code Channels” as a personal Claw, but the permissions/guardrails are hand-rolled and not enterprise-ready.
Concrete cases (best examples I have)
A few “12 months ago vs today” comparisons. Bracketed numbers are estimates — I don’t have stopwatch data, but the orders of magnitude are right:
- Push for Progress. The pace we ran at would not have been possible a year ago. [Estimate: would have needed ~2× the engineer-weeks at the pre-orchestration cadence to deliver the same scope.]
- LegalShield POC (2-week delivery). A year ago we would not have bid this as a 2-week project. Estimate: 6-8 weeks at April-2025 pace, with substantially less audit coverage.
- NF taxonomy cutover (under a month, end-to-end). A complete attribution taxonomy migration with full audit trail across the affected reports. We were clued in on the requirements late, and at April-2025 pace this would have meant 8-10 weeks of reporting blackout — flatly unacceptable for the client, which would have forced us to cut corners on the audit and ship something directionally correct but unverified. The client would have been put in a bad spot. Today: a major business-process cutover delivered in under a month, with stakeholder review driving the majority of the timeline, not engineering throughput. That last clause is the real story — the bottleneck moved from “us” to “the business,” which is exactly where you want it.
That’s the version of the question I’d actually pitch — the most-honest framing isn’t “X used to take 5 days, now it takes 1 hour.” It’s “X used to be uneconomical to do at all, or we’d have had to cut audit corners to fit the timeline. We now do X routinely, with the audit, and the timeline is paced by business acceptance rather than engineering.”
Why we feel busier (the Jevon’s Paradox bit)
We made the work cheaper. We did not get less of it. We got more of it, at a higher floor.
Things that previously sat in the “uneconomical” bucket:
- Multi-turn audits inside the SDLC.
- Always-up-to-date documentation.
- TDD specs and instrumentation for data models.
- Cycles spent paying down tech debt.
- Per-engineer pre-review of own work before peer review.
All of those are now within reach of a single skill or two. So we do them. And every one of them adds work to the queue while raising the standard.
The new practitioner challenge isn’t “can the agent do it” — it’s maintaining comprehension of all the work the agent is delivering. Knowing what’s happening across multiple parallel sessions. Getting paged at the right moment when the agent needs you and not the wrong one. The operating-system problem of running 3-4 agentic threads simultaneously and staying useful in each one.
That’s where I think MG’s next moat is, honestly. Tool-makers ship the agents. The teams that win are the ones that figure out the human operating model around them.
— Ben
Notes for Ben (post-draft)
A few editing flags worth a second pass before this goes to Jeff:
-
Bracketed estimates need verification. The Push for Progress / LegalShield / NF taxonomy numbers are my padding — please replace with real numbers if you have them, or hedge harder. Jeff is going to lift this for marketing materials so the numbers should survive scrutiny.
-
The IDD attribution. I credited him explicitly twice. Strip if Jeff already knows, or leave if you think it gives Jeff something to forward to whoever asked him. Up to you.
-
The /dq + targeting-system thread. This is genuinely the most important thing in the whole letter for MG’s roadmap and I gave it one paragraph. If you want this to be the load-bearing closer instead of the Jevon’s Paradox one, swap them.
-
OpenClaw / Claude Code Channels mention. Risk Jeff doesn’t know either reference. Add a one-liner (“an experimental project that ran agents off-machine via a custom infra layer”) if needed.
-
Length. This is ~1,400 words. If Jeff wants the marketing-paste version it’s the table + the Jevon’s Paradox kicker. If he wants the internal-roadmap version it’s the whole thing. Send both halves and let him choose.