Alex FinnJune 19, 202618m

How to get unlimited AI for free (GLM 5.2 local)

TL;DR

GLM 5.2 impressed him enough to compare it to Opus 4.8: Alex says its output on his 3D first-person shooter test was basically on par with Claude Opus 4.8, including game logic, effects, enemies, waves, ammo, and scoring.
The headline setup is local AI on a single machine: Using Unsloth's 2-bit quantized release, he ran GLM 5.2 locally on one Mac Studio, with the model weighing about 250 GB and realistically needing a 256 GB to 512 GB memory class machine.
The best local use case is slow, passive work that would be expensive in the cloud: His example is a Hermes agent continuously checking the codebase for his SaaS, Henry Intelligent Machines, for security issues and bugs every two hours, all day, every day.
Privacy is a core reason to run local: He contrasts ChatGPT and Claude, where prompts go to data centers and live on company servers, with local models where prompts stay on your own computer and cost only electricity.
You do not need elite hardware to start experimenting: He says smaller Macs can still run models like Gemma 4 or Nvidia's Neotron, while Qwen 3.6 27B is his recommendation for mid-tier hardware and GLM is for higher-end setups.
His broader claim is that personal desktop superintelligence is about a year away: Alex predicts that within 12 months, cheap machines like entry-level Mac Minis will run AI that is good enough for 90 percent of people, handling documents, code, and monitoring tasks in the background.

The Breakdown

Alex Finn says GLM 5.2 is close to Claude Opus 4.8, and now an Unsloth 2-bit version can run locally on a single high-memory Mac Studio for effectively free, private, always-on AI. His big pitch is not just the model quality, but the shift to personal agents that can work on your machine 24/7 without cloud costs.