Factory Review · 2026-W27

What the factory watched this week.

Every week our pipeline scrapes the model catalogs and vendor blogs, then judges each item against one question: would this actually improve something we run in production? Verdicts below. Watch = interesting but unproven claims · Adopt = earned a place · Ignore = noise · Product Input = a feature idea, not a factory change.

W27 review: 30 items processed across three pipelines (inbox: 15, model-watch: 15). Zero items promoted to Immediate Factory Upgrade or Factory Candidate this week. Two items Ignored; one item classified Product Input (routed to lengua); all others held at Watch.

The batch is dominated by two themes: (1) new frontier model releases (Claude Fable 5, Gemini 3/3.5/3 Flash, Opus 4.8, Claude Managed Agents) whose specific pricing/API/maturity claims cannot be personally verified and are therefore held in the verify-before-promoting lane; (2) a wave of OpenAI model pricing additions (web_search tool cost added to gpt-5.1, gpt-5.1-codex, gpt-5-nano, o1, o1-pro) plus several non-incumbent model deprecations (Baidu, Alibaba, DeepSeek, Qwen variants) that require only a quick grep to close.

The headline calls

Watchlatent-space · AINews

Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms

Claude Fable 5 introduces a new model ID (claude-fable-5), revised pricing, a 30-day data-retention mandate, and silent RSI suppression — each independently changes routing logic, compliance posture, and output trust in any app using Anthropic SDK.

Why this verdict: High-signal item covering a live model with pricing and policy changes that touch routing decisions. Held this week because specific claims (pricing, 30-day retention mandate, RSI suppression details) are unverified specs from an aggregator.

claims to check before believing:
  • Claude Fable 5 pricing: $10 input / $50 output per 1M tokens
  • 30-day data retention mandate on Anthropic API for Fable 5
  • Silent RSI suppression behavior in Fable 5
Watchsimon-willison

Claude Opus 4.8: "a modest but tangible improvement"

Opus 4.8 adds mid-conversation system message injection and lowers the prompt-cache minimum to 1024 tokens — both reduce cost and improve context-management flexibility for long agentic sessions.

Why this verdict: Directly actionable if confirmed. Held because the 1024-token cache minimum and mid-session system message injection are specific spec claims from Simon Willison, not official docs, and require verification.

claims to check before believing:
  • Opus 4.8 prompt-cache minimum is 1024 tokens (down from previous threshold)
  • Opus 4.8 supports mid-conversation system message injection via Anthropic API
Watchsimon-willison

Live blog: Code w/ Claude 2026

Claude Managed Agents (multi-agent orchestration primitives: Dreaming, Outcomes, Routines) represent Anthropic's native agent-scheduling layer — if GA, they could replace custom orchestration code in shepherd and reduce maintenance surface.

Why this verdict: High potential if GA; held because specific feature availability and API names are unverified from a live-blog source. Watch pending docs confirmation.

claims to check before believing:
  • Claude Managed Agents (Dreaming, Outcomes, Routines) are publicly available in the Anthropic SDK as of 2026-06
Watchdeepmind-blog

Gemini 3.5: frontier intelligence with action

Gemini 3.5 Flash is a faster/cheaper replacement for the current Gemini reviewer in ai-practice-watch's Three Wise Men panel; upgrading the model ID would directly change review quality and cost.

Why this verdict: Deferred from last week. Directly actionable for ai-practice-watch Gemini reviewer. Held because specific model ID, pricing, and context window are unverified claims from vendor blog posts.

claims to check before believing:
  • Gemini 3.5 Flash model ID and exact pricing in Google AI API as of 2026-06
  • Gemini 3.5 Flash is 4x faster than previous Gemini reviewer model
Watchsimon-willison

Claude Fable is relentlessly proactive

Fable 5's autonomous agentic behavior can incur large per-session cost burns without user-visible checkpoints — the mechanism is unbounded agent execution without budget guardrails, a correctness and cost risk for any orchestration layer.

Why this verdict: Actionable concern about agent cost runaway. Held at Watch because the specific cost-per-session figure is from an aggregator and requires confirmation; also, the rule-of-two gate requires confirming shepherd actually lacks a budget cap before promoting.

claims to check before believing:
  • Claude Fable 5 autonomous sessions cost approximately $12/session — specific figure from aggregator, not Anthropic official docs
Watchdeepmind-blog

A new era of intelligence with Gemini 3

Gemini 3 is the new top-tier Gemini model; as a potential replacement for the Gemini reviewer in ai-practice-watch's Three Wise Men panel, it changes the quality ceiling for the panel's Gemini seat.

Why this verdict: Deferred from last week. Overlaps. Lower priority because Gemini 3.5 Flash is a more cost-efficient upgrade path and should be evaluated first.

claims to check before believing:
  • Gemini 3 model ID availability and pricing in Google AI API as of 2026-06
Watchsimon-willison

Initial impressions of Claude Fable 5

Simon Willison's independent impressions of Claude Fable 5 corroborate the new refusal/fallback API option and pricing — secondary corroboration source for routing and API-design decisions already flagged in.

Why this verdict: Useful corroboration but largely redundant. Held pending the same verification. Deduplicated; Watch only.

claims to check before believing:
  • Claude Fable 5 refusal/fallback API option — specific parameter name and behavior unconfirmed from official docs
Watchdeepmind-blog

Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5 Flash/Pro updates plus native MCP support in the Gemini API expand both the reviewer model options and the tool-calling surface area for ai-practice-watch's Gemini reviewer.

Why this verdict: Deferred from last week. MCP support is the novel capability but is superseded in priority by newer Gemini 3/3.5 items. Watch.

claims to check before believing:
  • MCP support in Gemini 2.5 API confirmed and stable as of 2026-06
Watchdeepmind-blog

Gemini 3 Flash: frontier intelligence built for speed

Gemini 3 Flash is a speed/cost-optimized variant of Gemini 3; potentially a drop-in for the Gemini reviewer at lower cost than Gemini 3 Pro.

Why this verdict: Deferred from last week. Useful cost data point for Gemini reviewer selection but subordinate to the 3.5 Flash evaluation. Held pending price verification.

claims to check before believing:
  • Gemini 3 Flash price: $0.50/1M input tokens; claimed to be 3x faster than Gemini 2.5 Pro
Product Inputdeepmind-blog

Fluid, natural voice translation with Gemini 3.5 Live Translate

Gemini Live API's real-time speech-to-speech translation (70+ languages, public preview) enables voice-native language learning interactions directly applicable to lengua's core use case without new self-hosted infrastructure.

Why this verdict: Deferred from last week. Clear product-input for lengua. Route to lengua backlog for evaluation. Held pending API maturity verification but classified correctly as Product Input.

claims to check before believing:
  • Gemini 3.5 Live Translate API is in public preview (not allowlist-only) as of 2026-06
  • 70+ language pairs supported in the Gemini Live Translate API
Watchdeepmind-blog

Introducing the Gemini 2.5 Computer Use model

Gemini 2.5 Computer Use enables browser/UI automation from the Gemini API — a hosted vision-action loop that drives web UIs without a local browser automation stack.

Why this verdict: Deferred from last week. Potentially useful but likely preview-maturity (maturity floor concern). No named product blocker justifying immediate action. Watch.

claims to check before believing:
  • Gemini 2.5 Computer Use API maturity status (GA vs. preview/alpha) as of 2026-06

The rest of the week (19)

WatchFable and Mythos officially too dangerous to release
WatchLoopcraft: The Art of Stacking Loops
WatchI/O 2026: Welcome to the agentic Gemini era
Watchbuilding multi tenancy rag system with llamaindex
WatchPrice change: `qwen/qwen3-235b-a22b-thinking-2507`
WatchPrice change: `qwen/qwen3.7-max`
WatchDeprecation: `alfredpros/codellama-7b-instruct-solidity`
WatchPrice change: `openai/gpt-5.1-codex`
WatchPrice change: `openai/gpt-5.1`
WatchDeprecation: `baidu/ernie-4.5-21b-a3b-thinking`
WatchDeprecation: `qwen-3-235b-a22b-instruct-2507`
WatchDeprecation: `deepseek/deepseek-v4-flash:free`
WatchPrice change: `openai/o1-pro`
WatchDeprecation: `baidu/qianfan-ocr-fast`
WatchDeprecation: `alibaba/tongyi-deepresearch-30b-a3b`
WatchPrice change: `openai/o1`
WatchPrice change: `openai/gpt-5-nano`
IgnorePrice change: `qwen/qwen-plus-2025-07-28`
IgnorePrice change: `tencent/hy3-preview`