For low-volume and frontier-only workloads, hosted APIs almost always win. Above a workload-specific volume threshold (typically when monthly inference spend crosses five figures and is dominated by repetitive small-model calls), self-hosted inference frequently lands at 10-30% of the equivalent hosted-API cost. The break-even point is more predictable than the discourse suggests.
Solutions