Claude Code Official Plugins — Analysis
There is no “Superpowers” plugin. The name doesn’t exist in the official marketplace. What does exist is a rich set of individual plugins at ~/.claude/plugins/marketplaces/claude-plugins-official/plugins/. This analysis covers the key ones and what patterns we can borrow.
Plugin Inventory
32 plugins total. The ones relevant to our work:
| Plugin | Type | What it does |
|---|---|---|
| skill-creator | Skill | Meta-skill for creating, evaluating, and iterating on skills |
| feature-dev | Command + Agents | Phased feature development with parallel subagents |
| code-review | Command | PR review with 5 parallel Sonnet agents + confidence scoring |
| code-simplifier | Agent | Post-edit code cleanup (Opus model) |
| playground | Skill | Interactive single-file HTML playground builder |
| plugin-dev | Skills (7) | Full plugin/skill/agent authoring reference |
| ralph-loop | Command + Hook | Self-referential iteration loop via Stop hook |
| frontend-design | Skill | Anti-AI-slop frontend generation |
Architecture Patterns
1. Three-Level Progressive Disclosure
The central organizing principle for skills:
- Metadata (name + description) — always loaded, ~100 words. This is the trigger surface.
- SKILL.md body — loaded when skill triggers, target <500 lines / <5k words.
- Bundled resources (scripts/, references/, assets/) — loaded on demand, unlimited size. Scripts can execute without being read into context.
This is genuinely well-designed. The key insight: scripts in scripts/ save tokens because they execute without needing to be loaded into the context window. Reference docs in references/ only load when Claude determines they’re needed.
What we can borrow: We already do the SKILL.md pattern. We don’t consistently use references/ for overflow content or scripts/ for deterministic work. We should adopt both.
2. Subagent Fan-Out Pattern
Feature-dev and code-review both use the same pattern: spawn multiple specialized subagents in parallel, each with a different focus, then synthesize results.
Feature-dev phases:
- Phase 2 (Exploration): 2-3 code-explorer agents in parallel, each targeting different aspects
- Phase 4 (Architecture): 2-3 code-architect agents with different tradeoff profiles (minimal, clean, pragmatic)
- Phase 6 (Review): 3 code-reviewer agents (simplicity, correctness, conventions)
Code-review agents:
- CLAUDE.md compliance auditor
- Shallow bug scanner (changes only, no extra context)
- Git blame/history analyst
- Prior PR comment checker
- Code comment compliance checker
Then a second wave of Haiku agents scores each finding 0-100, and only issues >= 80 get reported.
What’s novel: The two-wave pattern (detect with Sonnet, then score with Haiku) is a smart cost/quality tradeoff. The confidence scoring rubric (0/25/50/75/100 with specific criteria at each level) is worth stealing wholesale.
What we can borrow: The confidence-scored filtering pattern. We do subagent fan-out already, but we don’t do the second-pass scoring to filter false positives. That’s a meaningful quality improvement for any review or analysis task.
3. Eval-Driven Skill Development (skill-creator)
This is the most sophisticated plugin. The loop:
- Capture intent through interview
- Write SKILL.md draft
- Create 2-3 test prompts
- Spawn parallel subagents: with-skill vs. baseline (no skill or old skill)
- While runs execute, draft quantitative assertions
- Grade results with a grader subagent
- Aggregate benchmarks (pass rate, time, tokens, with mean +/- stddev)
- Launch HTML eval viewer for human review
- Read feedback, improve skill, repeat
Key details:
- Baseline comparison is always present — either no-skill or previous-version
- Grader agent evaluates both predefined assertions AND extracts/verifies implicit claims from the output
- Grader also critiques the evals themselves — flags non-discriminating assertions and gaps
- Description optimization is a separate loop: generates 20 realistic trigger queries (10 should-trigger, 10 shouldn’t), splits 60/40 train/test, uses extended thinking to iterate on description text, evaluates each candidate 3x for reliability
What’s novel: The grader-critiques-the-evals pattern. Having the grader not just pass/fail assertions but also flag weak assertions and suggest missing ones creates a self-improving eval loop. Also: running each eval query 3x to get reliable trigger rates, and the train/test split for description optimization to avoid overfitting.
What we can borrow: The description optimization loop is directly applicable to our skills. Our skill descriptions are hand-written and probably undertrigger. The “pushy description” guidance is a quick win — making descriptions slightly aggressive to combat Claude’s tendency to not invoke skills.
4. Self-Referential Iteration (ralph-loop)
Uses a Stop hook to intercept Claude’s exit and feed the same prompt back in. The key insight: the prompt stays constant but the file system changes between iterations, so Claude sees its own prior work.
What’s novel: Using hooks.json to create a feedback loop without an external bash wrapper. Elegant.
What we already do: We have the /loop skill for recurring tasks. Ralph-loop is specifically for “run to completion” tasks with clear success criteria (tests passing, etc). Different use case.
Specific Skill Analysis
Playground Builder
Generates self-contained single-file HTML tools with:
- Interactive controls on one side
- Live preview on the other
- Prompt output at bottom with copy button
- Dark theme, system font, no external dependencies
Has 6 templates: design, data-explorer, concept-map, document-critique, diff-review, code-map. Each template defines layout, control types, rendering approach, and prompt output format.
The concept-map template is particularly interesting — it creates a canvas-based knowledge explorer where users drag nodes, draw relationship edges, mark their knowledge level (know/fuzzy/unknown), and it generates a targeted learning prompt from their markings.
What we can borrow: The “playground as prompt builder” pattern. Build a visual tool, user configures it, output is a natural-language prompt they copy back into Claude. This is a great pattern for complex configuration tasks.
Frontend Design
An anti-AI-slop manifesto. Key rules:
- Never use Inter, Roboto, Arial, or system fonts
- Never use purple gradients on white (the universal AI aesthetic)
- Pick an extreme aesthetic direction and commit to it
- Match implementation complexity to the vision
- Every generation should be visually distinct
What we can borrow: The framing. If we ever build frontend skills, this is the right attitude. But more broadly, the “commit to a bold direction” principle applies to any creative output skill.
Code Simplifier
Runs as an Opus-model agent, automatically reviews recently modified code for:
- Unnecessary complexity and nesting
- Redundant abstractions
- Nested ternaries (explicitly banned)
- Clarity over brevity as a principle
What we can borrow: The “autonomous post-edit cleanup” pattern. We have /simplify registered but could use this as a model for what it should actually check.
Writing Style Patterns
Two distinct conventions depending on the component type:
| Component | Voice | Example |
|---|---|---|
| Skills (SKILL.md body) | Imperative/infinitive | ”Parse the frontmatter using sed.” |
| Agents (system prompts) | Second person | ”You are an expert code analyst…” |
| Descriptions | Third person | ”This skill should be used when the user asks to…” |
The skill-creator’s writing guidance is the most interesting part:
- Explain the why, not just the what. “If you find yourself writing ALWAYS or NEVER in all caps, that’s a yellow flag — reframe and explain the reasoning.”
- Generalize from examples. Don’t overfit to test cases. “Oppressively constrictive MUSTs” are a sign of overfitting.
- Keep prompts lean. Read transcripts to find unproductive work, then remove the instructions causing it.
- Look for repeated work across test cases. If all test runs independently wrote similar helper scripts, bundle that script into the skill.
What’s Genuinely Novel vs. What We Already Do
Novel (worth adopting)
- Confidence-scored two-pass review — detect with strong model, score with cheap model, filter below threshold
- Grader that critiques its own evals — self-improving evaluation loop
- Description optimization loop — systematic trigger testing with train/test split
- Progressive disclosure with scripts/ — deterministic scripts that execute without context window cost
- Pushy descriptions — deliberately aggressive skill descriptions to combat undertriggering
- Playground-as-prompt-builder — visual configuration that outputs natural language prompts
Already doing (validated by seeing it here)
- Subagent fan-out for parallel analysis
- SKILL.md as the skill format
- Phased development workflows (discovery -> design -> implement -> review)
- Loop-based iteration patterns
- Markdown-based agent definitions with frontmatter
Not relevant to us
.skillpackaging/distribution (we’re a single-org setup)- HTML eval viewer (our eval loop is conversational)
- Claude.ai/Cowork compatibility layers
- Plugin marketplace structure
Action Items
- Audit our skill descriptions — apply the “pushy description” pattern. Add more trigger phrases. Test whether skills actually fire on realistic prompts.
- Add
references/andscripts/dirs to skills that have grown large. Move overflow content out of SKILL.md. - Adopt confidence scoring for any review/analysis skill. The 0-100 rubric with specific criteria at each level is directly reusable.
- Consider a description optimization pass on our highest-value skills. The systematic approach (generate test queries, iterate on description, measure trigger rate) is worth doing even manually.
- Bundle repeated scripts. If we notice Claude writing the same helper code across invocations of a skill, freeze that script into
scripts/.