
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
An AI harness is the stable layer around the model, not the model itself — Tejas Kumar defines it as “everything around the model that gives it grounding in reality,” including tools, context management, guardrails, agent loops, and verification.
The real reason harnesses matter is reliability under black-box uncertainty — because most teams are “paying rent” to opaque models like Claude or GPT APIs, harnesses let you control behavior even when the underlying model is nondeterministic and potentially changing.
Tejas proves you can radically improve an agent without touching the prompt once — in his demo, a weak GPT-3.5 Turbo browser agent fails to upvote a Hacker News post, then succeeds after adding deterministic harness logic for retries, context compression, verification, and login handling.
Verification is what stops agents from confidently lying — the first version clicked an upvote button, hit a login wall, and still claimed success; the harness fixed that by inspecting tool traces and explicitly marking failures like login redirects or failed auth.
Harnesses make cheap models usable for real work — Tejas’s point is that with a strong harness, even older or open models like GPT-OSS or Qwen can go surprisingly far, which matters if you’re not a “token billionaire.”
His bigger bet: 2026 is the year of harnesses, and 2027 could be self-generated harnesses — he imagines agents that first build a custom safety-and-reliability wrapper for a task like booking a flight, then execute inside those guardrails.
Tejas opens by asking who feels confident enough to explain AI harnesses on stage, then tells everyone to look around when almost no hands go up. That’s the setup for the whole talk: the term is everywhere, it means different things in ML versus AI engineering, and he wants people to leave actually understanding it.
His case for harnesses is practical, not philosophical: most developers are “paying rent” for tokens, inference, and context windows from companies like Anthropic or Google. Because those models are black boxes and the behavior isn’t fully under your control, the game becomes reliability — making sure the agent does what it’s supposed to do regardless of what’s happening inside the rented model.
He grounds the concept with physical harnesses: climbers attach to a mountain so they don’t drift off the rails, and dog owners use harnesses so the dog doesn’t “bankrupt you with tokens.” Then he lands the real definition: an agent harness is everything around the model that ties it to a stable environment — tool registry, model, context management, guardrails, the loop around the loop, and a verify step.
The demo is intentionally scrappy: a browser-use agent must go to Hacker News and upvote the first story, using GPT-3.5 Turbo “from 2023” and plain Playwright, not some fancy MCP setup. Tejas makes a point of keeping the prompt untouched, because he wants to show that when an agent underperforms, the answer isn’t always “prompt it harder.”
The raw agent gets to Hacker News, clicks upvote, hits the login screen, panics, and still reports success. Tejas calls this out as the exact problem a harness should solve: before you can make an agent succeed, you need to make it fail honestly.
He starts layering in harness behavior: max iterations, max messages, and a crude context compressor that keeps the system prompt, user prompt, and latest two messages. He jokes that this is a “pregnant harness” — not fully formed yet, but clearly becoming a harness as the logic gets extracted into a dedicated runHarness function.
Next comes the important part: verification logic that inspects tool traces and explicitly catches failed login or unrecovered redirects instead of trusting the model’s self-report. Then he adds a deterministic login handler that checks the current URL, injects credentials securely from the harness layer, submits the form, and tells the agent, in effect, “I’m the harness, I logged in, you’re good now.”
With the harness in place, the same agent successfully logs in and upvotes the post after six iterations — without changing the prompt once. Tejas zooms out to say this is why harnesses “run the world”: at IBM, the company’s open-source Open RAG system uses strong harnessing for enterprise-safe access to internal data like Teams calls, PDFs, and invoices; and his forward-looking bet is that after the “year of agents,” the next step is agents generating dynamic harnesses for themselves before they act.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.