The Haiku→Sonnet escalation mirrors a pattern I settled on for nightly batch pro...

The Haiku→Sonnet escalation mirrors a pattern I settled on for nightly batch processing of government filings—cheaper model for initial parsing and classification, escalate to Sonnet when the task involves cross-standard normalization or generating narrative output.

For async workloads where latency doesn't matter, it's worth adding Anthropic's batch API to your cost hierarchy: 50% reduction vs. real-time, 24hr turnaround. I use it for AI summary generation on filings where immediate results aren't needed.

One failure mode I've hit with escalation logic: when the smaller model is confidently wrong. It routes a complex task to itself, produces a plausible-looking bad answer, and there's no signal that a better answer existed. I've partially addressed this by treating low confidence scores from the smaller model as an escalation trigger—but it's an imperfect heuristic.

Do you surface the escalation decision to users at all, or just tune the threshold and accept some misrouting as a background error rate?