AI EngineerJune 2, 202612m

What Lies Beneath the API — Benjamin Cowen, Modal

TL;DR

Fine-tuning is becoming the natural next step for mature AI products: Cowen says as products specialize, teams increasingly turn to custom models for better performance, lower cost, and tighter alignment with their actual business logic.
Frontier APIs are fast to start with, but weak on customization: Prompt tricks like "caveman mode" can cut tokens, but they do not solve hard requirements around latency, throughput, or domain-specific metrics when a startup scales 100x or lands enterprise contracts.
The economics can flip hard in favor of custom models: Cowen cites Intercom beating its frontier API at 1/5 the cost, while Pinterest reportedly saw orders-of-magnitude gains from fine-tuning.
You may already have the ingredients to train a model: If you have an agent harness, product evals, and logs showing what works and what fails, Cowen argues you likely already have the raw materials for supervised fine-tuning or RL.
Training no longer requires a massive monorepo or a dedicated cluster team: With open-source tooling plus serverless compute, Cowen says supervised fine-tuning and RL setups can each be done in about 300 lines of Python.
RL is becoming highly practical because rollout workloads are massively parallel: Modal customers are scaling to 50,000 to 100,000 sandboxes for reinforcement learning, which Cowen frames as a sign that this once-specialized workflow is now accessible to product teams.

The Breakdown

Fine-tuning is no longer a giant infrastructure project: Benjamin Cowen argues you can do supervised fine-tuning or reinforcement learning in roughly 300 lines of Python, and he says companies like Intercom are already beating frontier APIs at one-fifth the cost. His bigger point is that if your product is truly differentiated, it will likely become domain-specific sooner than you think, so you should start collecting the data and evals now.