MiMo-V2.5 Review | Is this the best omni model?
TL;DR
MiMo V2.5 is a huge step up from V2 — 0xSero says the old open-source MiMo V2 felt weak and got stuck in thought loops, while V2.5 adds image, video, and audio support and has been good enough that he used it all day on real work.
The model is big but surprisingly runnable at home — he’s running the 310B-parameter model with 15B active parameters in FP8 across four RTX Pro 6000s with 96 GB each, getting about 120 tokens/sec decode and 4,000–8,000 prefill.
The hardware story matters as much as the model — despite compatibility headaches on cards like the 6000s, 5000s, and 4000s, he got it working with Triton and SGLang, and says you could also run it in 4-bit on 8x3090s or even on an M3 Ultra Mac.
MiMo V2.5 is strong at computer use, but it overthinks constantly — in his testing it reliably updated his website, fixed bugs, and navigated tools, but it tends to spiral when prompts are vague and there’s no clear way to tune its reasoning depth.
Prompt quality changes everything with this model — 0xSero says MiMo does well when you give it a tight scope and explicit instructions, but when you leave gaps it tries to “fill the gaps by thinking” and can end up going in circles.
The biggest weakness may be tooling, not raw capability — he shows MiMo hanging inside VLM Studio on tool calls even though the same PI agent harness works elsewhere, suggesting front-end limits or parser issues are holding it back more than the model itself.
The Breakdown
From disappointment to obsession with MiMo V2.5
0xSero opens by saying he really didn’t like MiMo V2 — it felt weak and got trapped in thought loops. But V2.5 changed his mind fast: it now supports images, video, and audio, and he’s been using it all day because it actually performs.
The absurd home-lab setup behind the demo
He’s running MiMo V2.5 on four RTX Pro 6000s, each with 96 GB VRAM, in FP8. The model is around 310B parameters with 15B active, and he reports roughly 120 tokens/sec decode, 4,000–8,000 prefill, 250K context configured, and about 375 GB of VRAM in use — basically “an air conditioner’s worth” of power for the rig.
Why serving this thing is harder than just having the GPUs
A lot of Blackwell-era optimizations don’t work cleanly on his hardware, so getting this stack running wasn’t plug-and-play. He ended up using Triton and SGLang, with 8-bit KV cache to cut context memory in half, and says that if you’re fully GPU-resident, SGLang or VLM-style serving makes more sense than LM Studio and llama.cpp.
His custom interface: VLM Studio meets agent workflows
He built VLM Studio because he wanted the convenience of LM Studio but with his own recipes, project attachment, browser access, file system access, and model switching in one GUI. Under the hood, he says it uses Theo’s T3 Chat UI with the PI agent harness plugged in, so whatever’s running on his GPUs can become an agent that actually does tool calls and work.
A simple question exposes the model’s weird reasoning habits
He asks a basic everyday question about whether to drive or walk to a car wash that’s five minutes away, and MiMo confidently says “definitely walk.” When he points out that this would leave the car at home, the model backtracks — a funny little demo of his main complaint: “they’re kind of dumb,” but the fact it works at all is still amazing.
Where MiMo actually shines: editing sites and doing computer use
The more serious demo is a website task: make a single-page site with a creative animated representation of itself as an LLM. He says MiMo has already done solid work updating his site’s colors, text, animations, and content by reading his GitHub, and he calls it “pretty reliable” at computer use — as long as you don’t let the thinking run wild.
Good prompts keep it focused; vague prompts send it in circles
His live and recorded workflows show the same pattern: with a clear scope and specific instructions, MiMo can make something genuinely nice. But if the ask is vague, it seems to try to compensate by thinking harder, which often just means going in circles until a tool call fails or hangs.
The final verdict: impressive output, messy edges, worth trying
He compares a before-and-after of his website and clearly prefers MiMo’s newer animation flow, even if he says it benefited from prior splash-screen work by other models. By the end, he’s sold: it helped him fix bugs that had been hanging for a week, even digging into the operating system, and his bottom line is simple — if you’ve got the VRAM or an M3 Ultra, it’s cheap and worth trying.