XcodeBuildMCP

Summary

XcodeBuildMCP is an open-source MCP server and CLI by Sentry that gives AI agents full control over Xcode workflows: build, run, test, debug (LLDB), deploy to simulator and physical devices, plus UI automation and screenshot capture. The mental model: Playwright for iOS/macOS development — it closes the loop between code generation and verification by letting an agent make changes, build, screenshot the result, self-critique, and iterate autonomously.

Key capabilities:

Full Xcode lifecycle control. Build, run, test, and debug from an MCP-connected agent. No more copy-pasting errors back and forth.
Screenshot-driven feedback loop. The agent captures UI screenshots after builds, enabling visual self-critique without a human in the loop. This is the autonomous iteration pattern that makes AI-assisted UI development actually work.
LLDB integration. Agents can set breakpoints, inspect variables, and step through code — debug workflows that previously required a human staring at Xcode.
Simulator + physical device deployment. Not just simulator-only; works with real hardware, which matters for testing device-specific behaviors.
v2.0 CLI for scripting/CI. The CLI addition means this can be embedded in CI pipelines, not just interactive agent sessions.

MIT licensed. Works with Claude Code. Website: xcodebuildmcp.com.

For 01-projects/squarely-puzzles/index, this is a direct capability unlock. The current iOS development workflow for Squarely Puzzles could shift from “Claude writes code, Ray builds and screenshots manually” to a fully autonomous build-test-iterate loop. Combined with 01-projects/squarely-puzzles/growth-strategy, faster iteration cycles mean faster experimentation on UI changes, onboarding flows, and visual polish.

This is a concrete instance of 06-reference/concepts/skills-as-building-blocks — MCP servers as composable capabilities that compound. XcodeBuildMCP alone is useful; XcodeBuildMCP + Claude Code + a design system creates an autonomous iOS development agent.

Open Questions

What is the actual reliability of the autonomous loop in practice? Screenshot-based self-critique sounds great, but how often does the agent correctly identify visual issues vs. hallucinating problems?
How does this interact with SwiftUI previews? If SwiftUI previews are faster than full builds, is there a lighter-weight feedback loop available?
Could this be combined with App Store Connect APIs for an end-to-end deploy pipeline — code change to TestFlight build with zero human intervention?
What are the security implications of giving an AI agent LLDB access and physical device deployment?