XcodeBuildMCP
Summary
XcodeBuildMCP is an open-source MCP server and CLI by Sentry that gives AI agents full control over Xcode workflows: build, run, test, debug (LLDB), deploy to simulator and physical devices, plus UI automation and screenshot capture. The mental model: Playwright for iOS/macOS development — it closes the loop between code generation and verification by letting an agent make changes, build, screenshot the result, self-critique, and iterate autonomously.
Key capabilities:
- Full Xcode lifecycle control. Build, run, test, and debug from an MCP-connected agent. No more copy-pasting errors back and forth.
- Screenshot-driven feedback loop. The agent captures UI screenshots after builds, enabling visual self-critique without a human in the loop. This is the autonomous iteration pattern that makes AI-assisted UI development actually work.
- LLDB integration. Agents can set breakpoints, inspect variables, and step through code — debug workflows that previously required a human staring at Xcode.
- Simulator + physical device deployment. Not just simulator-only; works with real hardware, which matters for testing device-specific behaviors.
- v2.0 CLI for scripting/CI. The CLI addition means this can be embedded in CI pipelines, not just interactive agent sessions.
MIT licensed. Works with Claude Code. Website: xcodebuildmcp.com.
For 01-projects/squarely-puzzles/index, this is a direct capability unlock. The current iOS development workflow for Squarely Puzzles could shift from “Claude writes code, Ray builds and screenshots manually” to a fully autonomous build-test-iterate loop. Combined with 01-projects/squarely-puzzles/growth-strategy, faster iteration cycles mean faster experimentation on UI changes, onboarding flows, and visual polish.
This is a concrete instance of 06-reference/concepts/skills-as-building-blocks — MCP servers as composable capabilities that compound. XcodeBuildMCP alone is useful; XcodeBuildMCP + Claude Code + a design system creates an autonomous iOS development agent.
Open Questions
- What is the actual reliability of the autonomous loop in practice? Screenshot-based self-critique sounds great, but how often does the agent correctly identify visual issues vs. hallucinating problems?
- How does this interact with SwiftUI previews? If SwiftUI previews are faster than full builds, is there a lighter-weight feedback loop available?
- Could this be combined with App Store Connect APIs for an end-to-end deploy pipeline — code change to TestFlight build with zero human intervention?
- What are the security implications of giving an AI agent LLDB access and physical device deployment?