06-reference

alphasignal gpt 5 5 workspace agents qwen 27b

Thu Apr 23 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·reference ·source: AlphaSignal ·by Lior Alexander
openaigpt-5.5workspace-agentsqwenagentic-codingai-pricing

“OpenAI GPT-5.5: 82.7% Terminal-Bench, $5/M tokens, live now” — @Lior Alexander (AlphaSignal)

Why this is in the vault

Three concurrent shifts in one issue: (1) GPT-5.5 ships with agentic-coding numbers that approach Claude’s territory at lower API cost, (2) OpenAI ships Workspace Agents — the “shared org-level bot” pattern that competes directly with Claude Skills + the autonomous-loop pattern RDCO is building, (3) Qwen3.6-27B claims to beat its own 397B sibling on coding while running locally on 18GB VRAM. All three change the cost/capability frontier RDCO is assuming for the harness.

⚠️ Sponsorship

No author-adviser relationships disclosed. Lior Alexander is the AlphaSignal founder writing in his own voice; no self-cross-promo to other AlphaSignal properties detected.

Issue contents

Top News (3 items):

  1. GPT-5.5 launch — 82.7% Terminal-Bench 2.0, 58.6% SWE-Bench Pro, $5/M input tokens, 1M context window, same speed as 5.4 with fewer tokens. Live in ChatGPT and Codex (Plus/Pro/Business/Enterprise); API “coming very soon.” Pitched as the “stop babysitting your AI” model — multi-step agentic coding, planning, tool use, self-verification.
  2. Workspace Agents in ChatGPT — shared team agents, cloud-resident, free until May 6 2026 then credit-based. Business/Enterprise/Edu/Teachers plans only. Use cases: triage, approval routing, IT tickets, Slack feedback capture, account research, call-transcript summarization (claims 5-6h/week saved per rep).
  3. Qwen3.6-27B — 77.2 SWE-bench Verified vs 76.2 for the 397B sibling, 59.3 Terminal-Bench 2.0, 18GB VRAM, Apache 2.0, 262K context, native multimodal (image+video), “Thinking Preservation” feature retains reasoning across conversation turns.

Signals (6 items, including 2 paid):

Mapping against Ray Data Co

Strong mapping — load-bearing for the harness thesis.

  1. GPT-5.5 pricing pressure on the Claude-Code harness. $5/M input tokens with 1M context puts OpenAI ~30-40% below the Opus-tier rate RDCO is paying for the autonomous loop. If the agentic-coding scores hold (82.7% Terminal-Bench 2.0 is competitive with 4.7’s reported numbers), the harness-vs-model debate Karpathy and Anthropic have been having gets a third data point: OpenAI is now shipping the “model does the harness work” pattern at a lower price point. RDCO’s current bet is that thin harness + Claude wins on judgment quality; this issue is an inflection point worth re-evaluating before the next API-cost review.

  2. Workspace Agents directly competes with the RDCO autonomous-loop pattern. OpenAI is shipping the “shared org-level bot that runs cloud-resident workflows” as a productized feature — the exact pattern Ray (this agent) implements via cron + Notion + sub-agent fan-out. Two questions: (a) does Workspace Agents subsume what we’re building, making the custom harness redundant? (b) or does the “describe a workflow, ChatGPT guides you to build the agent” pattern miss the depth of vault-grounded judgment we’re optimizing for? Worth a deeper look — this might be a candidate for /cross-check against the harness-thesis cluster.

  3. Qwen3.6-27B at 18GB VRAM is the local-model story for sensitive workflows. RDCO has flagged future-state desire to run finance/contact-graph workflows locally. A 27B Apache-2.0 model that beats its 397B sibling on coding and runs on a single consumer GPU (18GB = RTX 4090 territory) makes that future-state cheaper than expected. File for the local-model evaluation thread when it surfaces.

Threshold-crossing note: GPT-5.5 + Workspace Agents in the same week is the first time OpenAI has shipped an integrated harness+model package that meaningfully threatens the “Claude is the obvious choice for autonomous COO work” assumption. Not a recommendation to switch, but a recommendation to re-examine the assumption before the next quarterly review.

Curation section — notes

Deep-fetches performed

None. Cap was 2; the email body itself had sufficient specifics on all three Top News items (benchmark numbers, pricing, availability windows, technical specs). Deep-fetches deferred to the moment a specific RDCO decision needs the official source (e.g. confirming GPT-5.5 API pricing before a harness re-evaluation).

Body summarized and paraphrased from the AlphaSignal email; benchmark numbers and pricing quoted as factual claims attributed to the original sources (OpenAI, Alibaba/Qwen). No paragraph-length copies. AlphaSignal text used under fair-use summary for assessment purposes.