Back to Podcast Digest
AI Engineer12m

What Lies Beneath the API — Benjamin Cowen, Modal

TL;DR

  • Fine-tuning is becoming the natural next step for mature AI products: Cowen says as products specialize, teams increasingly turn to custom models for better performance, lower cost, and tighter alignment with their actual business logic.

  • Frontier APIs are fast to start with, but weak on customization: Prompt tricks like "caveman mode" can cut tokens, but they do not solve hard requirements around latency, throughput, or domain-specific metrics when a startup scales 100x or lands enterprise contracts.

  • The economics can flip hard in favor of custom models: Cowen cites Intercom beating its frontier API at 1/5 the cost, while Pinterest reportedly saw orders-of-magnitude gains from fine-tuning.

  • You may already have the ingredients to train a model: If you have an agent harness, product evals, and logs showing what works and what fails, Cowen argues you likely already have the raw materials for supervised fine-tuning or RL.

  • Training no longer requires a massive monorepo or a dedicated cluster team: With open-source tooling plus serverless compute, Cowen says supervised fine-tuning and RL setups can each be done in about 300 lines of Python.

  • RL is becoming highly practical because rollout workloads are massively parallel: Modal customers are scaling to 50,000 to 100,000 sandboxes for reinforcement learning, which Cowen frames as a sign that this once-specialized workflow is now accessible to product teams.

The Breakdown

Fine-tuning is no longer a giant infrastructure project: Benjamin Cowen argues you can do supervised fine-tuning or reinforcement learning in roughly 300 lines of Python, and he says companies like Intercom are already beating frontier APIs at one-fifth the cost. His bigger point is that if your product is truly differentiated, it will likely become domain-specific sooner than you think, so you should start collecting the data and evals now.

Was This Useful?

Share