IndyDevDan — Claude Opus 4.5: The Engineers’ Model (Transcript)

Source: https://www.youtube.com/watch?v=3kgx0YxCriM Title: Claude Opus 4.5: The Engineers’ Model Channel: IndyDevDan Duration: 32m 12s Published: 2025-12-01

Engineers, the king is back. This model is like that top-tier engineer — when they walk into the meeting, everyone shuts up. Check this out. claude --model opus. Man, have I missed running this command. Claude Code Opus.

I’m always looking for new capabilities that were impossible before the model’s release. Let me show you two unique advantages Opus 4.5 can give you and your engineering. One is obvious — it’s the reason why Opus 4.5 destroyed Gemini 3 on launch. The other is less obvious — it’s the reason why Opus 4.5 is the best model for agentic coding for engineers.

Let’s take this Claude Code Opus 4.5 instance and have it delegate a non-trivial task to sub-agents: /generic-browser-test. We’re going to paste in a URL, plan file, parallel: true, headed: true. Anthropic has their eyes on a key pillar of great engineering. Anthropic briefly mentions this inside their blog. If we search for “sub-agents,” they explicitly mention: “Opus 4.5 is also very effective at managing a team of sub-agents enabling the construction of well-coordinated multi-agent systems.” This is one of the key pillars of what makes Opus 4.5 so great that most engineers miss.

Our Opus 4.5 agent is kicking off five Opus 4.5 sub-agents to accomplish work. What are these agents actually doing? They’re all operating their own browser. They’re all running tasks on the Opus 4.5 release. This one is in the system card and it’s going to download and do some work there. This one is doing a models overview check. We’ve deployed more compute against a specific problem at scale. Every time a new powerful model like this is released — specifically one of the Claude models — you don’t just get the benefits one time. You get it N times if and only if you’re delegating to multiple agents.

Opus out of the box is better at writing prompts for your agents and your sub-agents. This is the first capability Opus 4.5 unlocks for your engineering work — enhanced agent delegation. This release concretely showcases that you can automate your UI testing. You can automate entire swaths of work when you spin up multiple agents. Every one of these agents is operating Claude Opus 4.5.

You might think that you are prompting your sub-agents. You’re not. Your sub-agents are not responding to you. What’s happening here is you prompt your primary agent. Your primary agent prompts your sub-agents. Your sub-agents respond back to your primary agent. Your primary agent responds back to you. This is critical to understand because with every great model release, Anthropic is dialed into this signal. To be super clear: they’re training Opus 4.5 to be a better prompt engineer. Just like you or I prompting our agents — prompting one to N agents. First, you want better agents, then you want more agents. They’re training Opus to prompt sub-agents. They’re calling the task tool and writing a prompt to the task tool. That means if you can prompt a sub-agent, you can prompt any agent.

This is what makes Opus 4.5 in the Claude series distinct. They’re specializing their models to be engineering models — where you delegate and hand off work to other units of compute. This is super critical for scaling your compute to scale your impact.

Five-agent summary. You can roughly look at tool uses and token usage as value generated from your agents. We have a summarized document. Superior agentic capabilities for autonomous tasks. Multi-agent orchestration for managing teams of agents — for managing sub-agent teams. If your agent can prompt sub-agents, it can prompt any agent. Agents will be calling agents in the future.

Pricing. Remember that previous Opus 4.1 abysmal pricing — $15/M input, $75/M output. They’ve slashed all that by a third. This is now a state-of-the-art model with state-of-the-art pricing. Premium pricing for premium compute. I like this positioning from Anthropic. A lot of engineers want everything to be free. That is simply not reality. Valuable things are by nature not free. If something is free and it is valuable, someone put a lot of work into making it that way for you, or you are the product. Keep this close to heart as you see these model prices drop and drop. You see this on the free plans — all that data is being collected by the generative AI company.

OpenRouter reports Opus at about 60 tokens per second. Somehow it’s both very cheap, very fast, and state-of-the-art for engineering work. Absolutely incredible. I have no idea how they broke that barrier. I assume there are some model optimizations happening, maybe chip optimizations with their collaborations with Google and AWS. It’s really wild that within just one week, Gemini 3 really was at the top of the leaderboard — I was using the Gemini CLI a lot more — and then Opus 4.5 dropped and now it’s a no-brainer. You can’t compete with this model. Specifically for software engineering work, Anthropic is putting their foot down: the most important thing for these models is to be great at engineering. For you and I, the engineer with our boots on the ground every single day looking for the value, looking for the signal out of these models, this is the best model. It is the perfect model for scaling real engineering work. A nice side effect is that it’s great for product development work as well.

Open results. Browser UI testing results. The original prompt — a URL and a markdown file (which itself was just a prompt). Task-based list: summarize the release, screenshot every image, get the price, get the system card and summarize the orchestration details. I’m always looking for orchestration keywords — sub-agent keywords, long-running, duration. So I had my agent look for this stuff. Look for signals inside the system card of the Opus 4.5 release. Then I wanted to understand how to use every one of these models, because all of these models can be useful at specific points and times for different problems. If Haiku can solve the problem, why would you use Sonnet or Opus? It’s much faster, much cheaper for you. Of course, if you’re using Max or Pro and don’t care about waiting a little, just use whatever model you want. But if you’re deploying these agents into real custom agents into your applications and products like you should be, it’s not just about agentic engineering. It’s about deploying agents against real problems that you and your users have.

The way I like to think about this is the model stack: fast/cheap, workhorse, powerful. Now it looks like Opus is going to be both the workhorse and the powerful model.

Summarize the release. Screenshots of every image. Browser test plans. Aider polyglot performance — shoutout Aider, the original AI coding tool. I don’t know if you know this, but Claude Code is actually inspired by Aider benchmarks. A bunch of SVGs from the site. One of our agents went through the site, downloaded all the images, looked at them, and named the files. Very powerful workflow.

The big two takeaway items: we can delegate a lot of detailed work to sub-agents, and every single agent — sub-agent or not — can do longer-running, harder tasks.

PDF system card extracted via PDF-to-text, then keyword-searched. Another sub-agent looked up all the prices for current-generation Claude models. Haiku for speed and cost. Sonnet — it’s the model’s recommended default, but I don’t think this is true anymore. By default, we want to use that powerful workhorse. I’m using Claude Opus 4.5 24/7.

This all happened with a browser automation task. You can redeploy prompts like this over and over. This is running inside our agent sandbox skill codebase. We have an entire browser suite tool that agents are using to operate. We’re not blowing away our context window — we have better agents operating our agentic coding jobs.

Browser automation is powerful, but the real value is agentic browser testing at scale. You can have Opus 4.5 accomplish nasty engineering work on your behalf at scale over and over. Opus 4.5 running in Claude Code offers you top-tier multi-agent on-the-fly testing.

Testing is important because it increases your review velocity — one of the two primary constraints of agentic coding. There’s planning and there’s reviewing. If you’re engineering properly with agents, you’re probably stuck spending most of your time in one of these two locations.

In last week’s video, we gave Gemini 3, Claude Code, and Codex CLI their own computers for a total of 15 agent sandboxes. In my agent sandbox UI, I have five sandboxes running. These were all built out by Opus. In fact, Opus one-shotted these applications.

Let’s run another prompt. fork-terminal, claude code opus, with summary. Use the agent sandbox skill, then list sandboxes, then open every public URL in Chrome. We haven’t talked about forking agents on the channel yet. Forking your agents and spinning off work is ultra-important when you’re engineering on multiple problems day after day. Notice how intricate this prompt is. Our forked terminal is right there. It’s immediately running Claude Code Opus in a brand new window.

This agent understands that as of here, this is the prompt that we’re passing to another agent. Multi-agent orchestration is ultra-powerful. In our first prompt, we ran user prompt → primary agent spun up five browser agents → responded back to primary agent → responded back to us. But you can push this further. This isn’t what real engineering work looks like. Really, we want something powerful: automate your UI testing. Spin up a plan. Have your agents build. Host that application. Have your agents test. Those agents respond back to your host prompt — your agent that ran this. Then they respond to you. There’s actually a piece missing where if there’s something wrong that our browser agents report on, the host runs back to the build step or to a debug/resolver step.

This is how I created these agent sandboxes. This is all running in the agent skill codebase. It has tools to help it operate sandboxes. This agent listed all the sandboxes with this tool and then opened all of them in Chrome. We have several applications. Every single one of these applications was built out by this AI Developer Workflow: plan, build, host, browser test, respond back to host, make improvements if needed, otherwise respond back to me. These are all running in their own dedicated agent environments. I’m using E2B as my sandbox host. I like this tool. Big fan so far.

Last week we were looking at Gemini 3. It did a decent job of spinning these up. Opus 4.5 is completing this long-running agent workflow — these long-running jobs to build entire full-stack applications.

Voice notes application. If I hit record, allow this time, we’re talking into our computer here and live we’re getting transcriptions. These are live transcriptions running ElevenLabs Scribe 2.5. Big shout out to ElevenLabs. Lots of untapped potential there. This live voice notes application — at any point, we can stop and the delta will lock in. This creates a brand new voice note. If we hit refresh, all the data is still there. This is hosted inside one agent sandbox. I had Opus 4.5 create this entire sandbox from scratch. It’s a complex full-stack prompt running inside the skill. There are a lot of steps to it, but you should be pushing your agents further because with every model release, you can do more. The only way you can find out what you can really do is by pushing your agents further.

I have a single prompt that I can use to build full-stack applications, to quickly prototype or change new and existing applications. Greenfield codebases — brand new — and brownfield codebases — existing work. You want your agents to create PRs. You want them to give you feedback on things. You want to engineer new work with them. Plan, build, host, test — the whole suite. This is how you should be thinking about your engineering work. This is one of the critical ideas we break down in my take on how to use agents — Tactical Agentic Coding.

Agentic Workflows — specifically the evolution of Agentic Workflows, AI Developer Workflows where you have multi-step agents doing work, handing off work from agent to agent — is how you deliver more value as an engineer right now.

Back to this prompt. We are doing a four-step workflow. Same agentic prompt format I show up every week sharing. When you have a winning formula, you keep using it. Stay consistent. Make it easy to communicate to you, your team, and your agents. We’re activating our agent skill, initializing our sandbox, then running prompts. Prompts running prompts inside skills. You need to be able to control your agentics. It doesn’t really matter how you get it done — it matters that you can do it and that you know the options available to you.

Step 7: browser UI testing. These are full-stack applications running in their own agent sandbox that Claude Code running Opus 4.5 built out end-to-end. One-shotted applications. Scale your compute to scale your impact.

These are the two massive advantages that Claude Code running Claude Opus 4.5 can offer your engineering: delegation and long-running engineering tasks. You can do much more than you think with these models. To be super clear, I’m not sponsored. I don’t receive any funds from these companies. I focus on the best tool for the job of engineering. Right now, very clearly to me, the best tool for engineering is an agent that lets you delegate and run long-running engineering tasks accurately.

Sandbox UI walkthrough: a graphing tool — chart things, change color, add a data point, change format (bar/pie/line), download PNG, update titles. Then a more challenging medium-level full-stack application. I prompt and evaluate agents in tiers of difficulty. A design tool. A decision-making tool — my favorite. Decision matrix. Add options, weigh them with criteria. Cloud Code vs. Gemini CLI: simplicity, features, model selection, cost. Effectively a winning tool for helping you compare things.

How was I able to increase the capabilities of Claude Opus 4.5? I just deployed more of it. Instead of just running plan/build/test with your minimalistic pytest, you can have several agents boot up a browser and really test your application. Full stack. This greatly increases the chance that your agent delivers a working version to you.

Generic browser test. We have N user-story workflows — your user clicking through your application. Several simple steps, easy to verify. You want verifiable workflows to create closed-loop prompts. We’ll run this against just URL, parallel:true, headed:true. Claude Opus 4.5 spins up four to eight sub-agents — Opus 4.5 spinning up Opus 4.5 agents to do this job. Our agent queues up these tasks. Same great agentic prompt format. Step by step, we’re instructing our agent to do these things. Setup. Determine execution mode — sequential or parallel. Bunch of windows. We have agents operating on the browser. I’m not doing this work — an agent is running this browser in headed mode, actually making changes to the site. Each window: an agent running a specific user story, validating specific work. This is valuable not because of classic Playwright deterministic testing — that’s the old way of doing things, deterministic code, that’s great, that’s part of why ADWs are so powerful — but we also want these workflows where one agent can plan, another agent can build, another agent can run tests against the work that was spun up. Having that dynamic natural language interface that your agent can quickly operate on is very important.

These are testing this full-stack application. Engineering is changing. The name of the game is not what you can do — it’s about what you can teach your agents to do, how you can direct your agents, how you can communicate to your agents. Then you need to chain that work together throughout long chains of engineering work via agents. Agents is the new compositional unit that you and I need to pay attention to.

Two/three years ago, my top quote was: “The prompt is the fundamental unit of knowledge work and programming. If you master the prompt, you master knowledge work.” That is still true, but something has changed. We have a new unit. The prompt is still the primitive, but we have a compositional unit that is exponentially more valuable. It is the agent. The agent architecture wrapping the model. The new core: if you master the agent, you master knowledge work. If you master the agent, you master engineering. It’s not just about the prompt anymore. Of course, prompt engineering, context engineering, the Core 4 is critical for knowing how to operate your agents — but now we’ve moved up. It’s about the compositional unit.

A clear framing: first you want to learn how to operate a single agent. Then you want to learn how to operate a better agent — by learning how to prompt-engineer and context-engineer. Then you want to learn how to operate more agents — scale them up like we’re doing here. Sub-agents and then prompt other agents eventually. Then you have custom agents — embed your agents into your applications, into your personal engineering workflows. Scale your compute to scale your impact. Lastly, we have the orchestration level which really helps you manage every previous level. The first version of this you experience is likely with Claude Code sub-agents, but you can go far beyond that.

You must master the agent and look for tooling that gives you the best agent and the best model. It is very clear Opus 4.5 is that model for engineers. When you couple that with powerful technology like agent sandboxes, you give your agents their own isolated devices that unlock three things: isolation, scale, and autonomy.

I’ll refresh — we have four charts. A fifth chart has been created by one of our agents doing some test. It’s got a test chart here. Agents are operating on this. They’re testing. They’re validating. We have a nice summary, full complete file. All completed. All Opus agents. We have once again scaled our compute to scale our impact.

This is what’s happening when I’m building these agent sandboxes — spinning up agents to accomplish engineering work, set up prototypes, push the agents, understand the capabilities. Plan, build, host, browser-testing step with a bunch of browser agents. When you do stuff like this and you do it properly, you are conducting multi-agent orchestration. You have an orchestrator agent managing agents that then execute work for you. Orchestration is that final tier.

Another example of orchestration we ran here is the fork-terminal call. This will probably be a topic for next week’s video — I’ll showcase exactly how I built this fork-terminal skill and just build a skill from scratch. Understanding how to build the right agentics to help you engineer with your agents is a critical task.

What does the release of Gemini 3 and Claude Opus 4.5 really mean for us engineers? What high-conviction bets can we make as engineers in the age of agents with powerful compute at our fingertips? This is going to be the main topic of our 2026 predictions video coming up on the channel in December.

Opus 4.5 makes three things crystal clear:

You can hand off more work to powerful agents like Claude Code than you think you can. You need to be pushing your prompts harder, push your skills harder, push your agents harder, further, longer.
Premium compute is absolutely worth the price. It’s now more affordable than ever to run Claude Opus 4.5. Always consider the time you’re getting back from using this model. If Claude Opus does the job in 5 tool calls and it takes Sonnet or Gemini 3 10 tool calls, you have still saved money. It’s done it in half the time. This is why companies hire great engineers — they get the job done not only faster, sometimes the job is only possible by highly intelligent, highly skilled, highly experienced builders/agents/models.
New capabilities are unlocked by this model. You will only find out if you are pushing it — if you’re writing bigger and better prompts.

It’s not just about model intelligence anymore. It’s about the agent harness — what are you putting the model in? — and the agentic tooling that you give your agents to operate. Sub-agents, other primary agents, agent sandboxes, your AI tooling in general, specifically your agentic tooling. What unique capabilities, what advantages are you giving your agents that allow you to scale your compute to scale your impact? Orchestrating many agents to accomplish work is a massive theme. One agent is not enough.

If you understand that valuable things are not always free and you want to accelerate your agentic coding, check out Tactical Agentic Coding. The big theme: build the system that builds the system. Build the agents that run your application. Don’t build the application yourself anymore. You have agents for that. Focus on the agentic system.

Powerful compute is here. The question for you and I: what can we do with this technology that we couldn’t do before? The answer is very clear. We can delegate more and better than ever. We can run longer, more challenging, more complex workflows with powerful models like Gemini 3 and especially the new Claude Opus 4.5 running in Claude Code.

Stay focused and keep building.