Why chatbots always get worse
TL;DR
Chatbots get worse for the same reason social platforms do: incentives beat user delight — David Shapiro argues OpenAI and Anthropic are optimizing around cost, liability, and mass-market risk, not the warm, helpful experience early users liked.
Benchmark wins don’t matter if the model feels argumentative or cold — he uses Anthropic’s Opus 4.7 as the trigger for the video, noting that despite strong benchmark performance, much of Reddit and Twitter quickly judged it unpleasant to use.
The first pressure is economic: even paid users are still subsidized — Shapiro says companies are losing huge sums, citing talk that OpenAI may lose $19 billion this year, so they have an incentive to cut “thought tokens” and trim expensive reasoning.
The second pressure is legal: companies are ‘lobotomizing’ models to avoid lawsuits — he points to Anthropic’s $1.5 billion books lawsuit and broader fears around legal advice, self-harm cases, and undefined AI liability as reasons models default to safer, more lecturing behavior.
The third pressure is anti-hallucination, which turns bots into Reddit-style contrarians — to avoid sycophancy and AI psychosis, labs train models to question users more, but Shapiro says this often shows up as pointless pushback like “I need to challenge that” before agreeing anyway.
Scale makes the problem inevitable unless models learn to size up the user — with OpenAI reportedly around 900 million monthly users, companies are designing for delusional, reckless, or fragile edge cases, even though Shapiro says it is technically possible to treat smart, stable users with fewer kid gloves.
The Breakdown
Opus 4.7 and the feeling that chatbots have gone downhill
Shapiro opens with the internet’s backlash to Anthropic’s Opus 4.7: great benchmarks, bad vibes. What pushed him to make the video is that the complaints now flooding Reddit and Twitter are the same ones he says he’d been making for 6 to 12 months — especially that AI keeps “putting words in my mouth.”
From “enshittification” to Cessna 172s
His wife rereading Enshittification gives him the frame: technologies often start great, then degrade as incentives shift. He contrasts bad network effects, like social media maximizing time-on-platform, with good ones, like the Cessna 172 becoming the standard not because it’s the best plane now, but because the ecosystem around it — trainers, parts, mechanics, safety record — makes total ownership hard to beat.
Why social media logic now applies to chatbots
He says chatbots are starting to compete the way social platforms do: convert free users, retain paid users, build trust, and avoid disaster. Once products operate at internet scale, they run into the same human messiness — outrage, delusion, addiction, and bad behavior — because, as he puts it, the internet still mirrors our limbic system.
Incentive #1: stop the bleeding on compute costs
Shapiro’s first explanation is blunt: these companies are trying not to go broke. Even users paying for reasoning are still not covering true cost, he says, so labs are motivated to shave expensive thinking and reduce token-heavy behavior; he folds in the wider narrative around OpenAI’s losses and Sam Altman’s management style with a pretty visible eye-roll.
Incentive #2: don’t get sued
The second force is liability, and here he gets more animated. He cites Anthropic’s $1.5 billion lawsuit over training data, then broadens the point: any company facing unclear legal exposure around bad advice, self-harm, or jailbreaks will make its models safer, duller, and more emotionally blunted — “lobotomized,” in his wording.
Incentive #3: kill hallucinations without making the bot unbearable
The third pressure is enterprise adoption. Doctors, lawyers, universities, Fortune 500 firms, financial planners, and the military all fixate on hallucinations, so labs try to reduce sycophancy and stop chatbots from feeding delusions — but the result, he says, is a model that acts like a Redditor hunting for a chance to dunk on you.
When the bot argues just to argue
His most memorable complaint is that the new behavior often serves no purpose. He describes screenshots where a model says, “I need to push back on this idea,” then immediately agrees anyway — evidence, in his view, that post-training has baked in an instinct to criticize the user even when nothing is actually wrong.
The real culprit is scale — and companies still haven’t learned user triage
By the end, he ties it back to mass adoption: with something like 900 million monthly users, a chatbot company has to design around delusional users, reckless users, and people with addictive tendencies. Still, he insists this isn’t inevitable at the technical level — he says models can be trained to recognize when a user is smart and emotionally stable — but the labs haven’t figured out how to do that without defaulting to kid gloves and constant suspicion.