We need to talk about AGI - Is it already here?
TL;DR
He argues AGI is already here in a brittle, household form — if a system can talk to you, use the internet, control devices, and act across your digital life, 0xSero says that’s effectively AGI even if it only works reliably for “30 minutes, not weeks at a time.”
The missing piece isn’t raw intelligence so much as infrastructure — his core claim is that today’s models are already capable enough, but what makes them feel non-general to normal people is the glue layer: hosting, tool access, harnesses, permissions, and workflows.
Open and local models are much more usable than most people realize — he contrasts giant models like GLM with roughly 1.5 TB of weights against much smaller options like Google’s Gemma at 62 GB, saying quantization can cut costs by 75-85% and bring serious models onto laptops or $1,000-ish setups.
Model architecture matters because it changes real-world speed and practicality — he explains dense vs. mixture-of-experts using examples like a 31B dense model versus an MoE model with only 4B active parameters, arguing the latter can be dramatically faster and better suited to mixed-memory machines like Macs.
He demos agentic AI as proof, not theory — using Hermes Agent, Droid, Superwhisper, and a Qwen 3.5 262B setup spread across eight RTX 3090s, he shows models cleaning his messy desktop, browsing the web, and attempting to build and test a 3D first-person shooter in Three.js.
His strongest economic point is blunt: wherever AI is cheaper than people, it will replace people — he says the real shift isn’t just coding or content creation but a broader restructuring of business, as models that can already browse, code, inspect, and automate digital work get embedded across companies.
The Breakdown
Starting with a household definition of AGI
0xSero opens with a thought experiment: if you had a system that could do almost anything on the internet, control electronics, and talk to you like a person, would that already count as AGI? His point is less about sci-fi robots and more about stripping the term down to what it means in everyday life, because the definition keeps shifting “year by year.”
Intelligence is broader than just LLMs
He makes an important distinction early: intelligence can mean a scripted trading bot just as much as an LLM. Trading bots are narrow and deterministic, while LLMs are “more random” in the sense that they can respond across a wider range of environments and tasks, but both are compressed decision-making systems plugged into the real world.
Hosting is where the dream hits physics and budgets
From there he gets concrete about what it takes to run this stuff. He praises vLLM as the most approachable high-performance hosting option, then points out the practical absurdity of self-hosting huge models like GLM when the weights are around 1.5 TB and real deployment pushes toward roughly 1.75 TB with activations. Compression and quantization help a lot — he mentions ReAP and Intel’s AutoRound-style tooling — but even after shaving off 85%, you’re still often in 400-500 GB territory.
Why smaller and MoE models change the game
He contrasts those giant models with more approachable ones like Google’s Gemma, which he says comes in around 62 GB and can be pushed down further with quantization. Then he explains dense vs. mixture-of-experts in plain language: a dense 31B model activates everything every token, while an MoE may only activate 4B parameters, making it much faster and friendlier to mixed-memory systems like MacBooks or AMD setups.
OpenRouter and free intelligence on tap
The cloud option, in his framing, makes the whole thing even less mysterious. He uses OpenRouter — showing 70 trillion monthly tokens and 5 million users — to argue that labs are already serving these “compressions” of public and private data at massive scale, and that builders now have access to a buffet of coding, vision, audio, and agentic models, often including free ones.
The demos: Discord bots, reverse engineering, and a desktop cleanup agent
He shows his own setup rather than speaking abstractly: Hermes Agent running through Discord channels, experiments with Ghidra for reverse engineering, and a local Qwen 3.5 262B configuration powering real tasks. One of the more relatable demos is having the model sort his “mess” of a desktop into labeled directories — his point being that Anthropic-style computer use is impressive, but no longer exotic or especially expensive.
Building a 3D shooter by voice, with GPUs screaming
The most vivid demo is him using Superwhisper to dictate a prompt for a 3D first-person shooter in Three.js, complete with lighting, reflections, enemies, and testing requirements. He shows the backend chewing through the request with eight 3090s at 100% utilization, about 175 GB of memory in use, and a 200k context window, while the agent browses, screenshots, and tries to validate the game loop.
His actual thesis: AGI is here, just brittle
He lands the plane by saying the barrier is not that the systems are dumb, but that people don’t yet know how to keep them moving through failure and setup friction. In his words, this is “artificial general intelligence for 30 minutes,” not for weeks, and the brittleness is infrastructural, not conceptual. That leads to his bigger warning: if AI can already do most digital work cheaper than humans, then business, jobs, and daily life are already being reorganized around it whether people are ready or not.