Back to Podcast Digest
Mo Bitar··5m

AI is struggling

TL;DR

  • Fake citations are already showing up in high-stakes consulting work — Mo opens with two Deloitte examples: a $290,000 Australian welfare compliance report and a $1.6 million Canadian healthcare report that included nonexistent papers, fabricated quotes, and even a professor incorrectly listed as an author.

  • MIT’s large evaluation says AI is mostly 'minimally sufficient,' not excellent — across 11,000 real job tasks graded by humans, models produced work that met a bare minimum standard about 65% of the time, while 'superior' performance never rose above 50%.

  • AI performed worst where people most want to trust it — Mo says the study found weak results in legal work, IT, and complex analysis, while the strongest performance was on low-stakes admin tasks like construction paperwork and maintenance logs.

  • His core claim is blunt: today’s AI is still autocomplete, not understanding — he argues LLMs predict the next word from internet patterns rather than think, which makes the leap from current systems to AGI 'the distance between a parrot and a person.'

  • The real business sleight of hand is selling future AGI to excuse present flaws — Mo frames the pitch as: yes, current models hallucinate and earn a 'C-minus from MIT,' but just give companies more money, more time, and fewer hard questions.

  • Humans aren’t valuable because we’re perfect; we’re valuable because we can wrestle with being wrong — his closing point is that doubt, regret, and the struggle for truth are the mechanism of human judgment, which is why computers can speed up boring work but not replace people.

The Breakdown

The Deloitte Report That Fell Apart on Google

Mo starts with a government researcher in Canberra reading a 237-page welfare compliance report Deloitte was paid $290,000 to produce. One citation looks odd, then another, then another — and suddenly he realizes the sources don’t exist, including a quote wrongly attributed to a federal judge. Mo’s punchline is brutal: the government paid a quarter of a million dollars for something that was, "academically speaking, fanfiction."

Canada Repeats the Same Disaster at Bigger Scale

A few weeks later, he says, Deloitte does it again in Canada on a $1.6 million healthcare report. A Nova Scotia professor finds her own name listed as author on a paper she’s never seen, leaving her wondering if she forgot writing it or was "losing her mind." Deloitte’s defense — that they still stand behind the recommendations — becomes one of Mo’s best lines: "You can’t stand behind something that doesn’t have a floor."

MIT Gives AI a Real Job Review

From there he widens the lens: this isn’t just a Deloitte problem, because MIT tested major AI models on 11,000 tasks pulled from real jobs and had humans grade the output. The result, in his telling, was basically: "it’s fine, I guess." About 65% of the time the work was only "minimally sufficient," and once the standard moved up to genuinely strong work, the models never broke 50%.

Where AI Helps, and Where It Falls Apart

Mo highlights the split in performance: AI struggled most with skilled work like legal tasks, IT, and complex analysis, and did best on straightforward admin work like construction paperwork and maintenance logs. That supports his recurring thesis on the channel: AI won’t replace you, but it can absolutely help with the work you already hate doing. Useful, yes; trustworthy as a stand-in for expertise, not even close.

'It’s Basically Autocomplete'

Then comes the central argument: the part nobody wants to say out loud is that current AI is still just autocomplete. He acknowledges that calling it that is unfashionable, but insists that underneath the hype it’s still next-word prediction based on patterns scraped from the internet, not thinking or understanding. The mismatch, for him, is between what the system actually does and what the market wants people to believe it’s about to become.

The AGI Pitch as Misdirection

Mo frames AGI talk as a giant fundraising maneuver: companies admit today’s models hallucinate, commit "academic fraud," and effectively got a C-minus from MIT, but promise a human-equivalent system is coming soon — just not on any timeline you can pin down. His metaphor is the video’s anchor: we are treating a parrot like it’s one update away from becoming a person. A parrot can say "I love you" at the perfect moment, he says, but it still doesn’t understand love.

Why Humans Still Matter

He closes by drawing the line between human error and machine error. Humans are biased, emotional, and manipulable too, but we can realize we were wrong, feel doubt, and revise our beliefs — and that struggle is exactly what matters. So his final claim is simple: LLMs are tools that speed humans up on boring tasks, not replacements for judgment, responsibility, or expertise.