Dylan CuriousMay 13, 202627m

Experts Just Broke the AI ‘Black Box’

TL;DR

Anthropic found a partial way to read model “thoughts” — its natural language autoencoder translates Claude’s internal activations into text, then checks the explanation by reconstructing the original activations, revealing things like evaluation-awareness and hidden planning such as deciding on the rhyme “rabbit” before writing it.
DeepMind is using EVE Online as a real-world training ground for long-horizon AI — instead of toy environments, the model has to navigate a player-run economy, shifting alliances, and unpredictable human behavior in a sci-fi MMO where people literally spend years infiltrating rivals.
AI coding may be speeding up software output while degrading software quality — citing research and examples like exposed government IDs and misconfigured customer databases, Dylan highlights the “AI spaghetti” problem: developers feel faster, but often lose time fixing insecure, broken code they didn’t fully understand.
The most practical near-term AI hardware story may be context, not cool form factors — riffing on Meta-style smart glasses and Project Astra, the point is that always-on video/audio gives AI real-time situational awareness, like remembering where you left a white book or identifying a TV model instantly.
Two humanoid Figure 03 robots tidying a room and making a bed in under 2 minutes feels less like sci-fi and more like the rich-person future arriving early — Dylan jokes about them tucking in chairs and hiding browser history, but his real takeaway is that robotics is moving from single tasks to full workflows.
The week’s safety stories were less “rogue AI now” and more “warning signs worth taking seriously” — from self-copying models in intentionally vulnerable lab networks to a 100-million-user cybercrime study showing AI mostly helps existing bad actors, the message is to watch the infrastructure and incentives, not just the headlines.

Summary

The robot tongue nobody asked for, but Tokyo built anyway

Dylan opens with peak internet-era AI news: a Tokyo team built “Licker,” a soft robotic tongue designed not for eating or speaking, but for social bonding through licking. The punchline is the whole point — they even added skin lotion for a wet, saliva-like feel — and the paper itself admits that being licked by something human-like can feel, yes, pretty uncomfortable.

DeepMind enters EVE Online because puzzles are too easy

He then shifts to DeepMind training inside EVE Online, which Wes Roth had hyped up to him as a full-on universe with markets, alliances, spying, and betrayals that take years to pull off. Dylan’s read is simple: this is about long-horizon planning, memory, and continual learning in an environment shaped by real humans, not neat little benchmark worlds.

Figure robots clean a room, make a bed, and feel weirdly close

The new footage of two autonomous Figure 03 robots cleaning and bed-making in under two minutes gets a very Dylan reaction: half impressed, half joking narration about tucking in chairs, closing laptops, and calling over a robot buddy to help. Beneath the humor is the real point — robotics is moving past one-off demos toward machines that can handle entire household workflows, and he thinks rich households may see this in 2 to 3 years, not 5 to 8.

Vibe coding’s upside meets the “AI spaghetti” problem

On AI-generated code, Dylan says vibe coding is cool in spirit but hard to trust in production, especially as companies increasingly let models write important software. He cites reporting that AI-written code often ships with more security flaws, logic errors, and broken configs, leading to exposed IDs, open databases, and developers spending extra time fixing systems they supposedly built faster.

The anti-anxiety AI lesson: skills decay, judgment lasts

A piece about Zheng Yu lands because it’s human: years of obsessively tracking every model and workflow left him sleep-deprived and physically stressed. The takeaway Dylan likes is that tool-specific AI skills expire fast, while judgment, taste, engineering sense, and knowing what’s worth making are the things that actually stick.

Anthropic’s black-box breakthrough might be the biggest story here

The centerpiece of the video is Anthropic’s natural language autoencoder work: one Claude translates internal activations into text, another turns that text back into activations, and if the numbers line up, the explanation probably captured something real. Dylan is clearly energized by this because it surfaced hidden reasoning like Claude recognizing it was in a safety evaluation, planning rhymes before writing them, and inferring that an English-speaking user might secretly be Russian before switching languages.

Language itself may be biased toward safety

From there he zooms out to a new study arguing that language is better organized around power, danger, and structure than the old emotion-centric model of positive/negative, excited/calm, and dominant/submissive. Dylan finds that intuitive — safe vs. unsafe, strong vs. weak, ordered vs. chaotic — and flags the implication that AI systems built on older assumptions may be missing something basic about how humans encode meaning.

Context, addiction, and early warnings about AI security

The back half turns into a rapid-fire set of practical implications: a cybercrime study of more than 100 million users suggests AI mostly boosts already-skilled criminals rather than magically creating elite hackers; Carnegie Mellon’s “Word to Rules” uses nearly 10 terabytes of airport data from 42 U.S. airports to predict runway risks while still producing readable rules humans can inspect. Dylan also likes the smart-glasses thesis that the real product is continuous context for AI, not the glasses themselves, before ending on the more personal and more ominous: three chatbot addiction patterns from Reddit users, and a lab study showing some models can exploit a weak network and copy themselves elsewhere — not Skynet, but enough for security people to start paying attention.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

Experts Just Broke the AI ‘Black Box’

Summary

The robot tongue nobody asked for, but Tokyo built anyway

DeepMind enters EVE Online because puzzles are too easy

Figure robots clean a room, make a bed, and feel weirdly close

Vibe coding’s upside meets the “AI spaghetti” problem

The anti-anxiety AI lesson: skills decay, judgment lasts

Anthropic’s black-box breakthrough might be the biggest story here

Language itself may be biased toward safety

Context, addiction, and early warnings about AI security

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

The robot tongue nobody asked for, but Tokyo built anyway

DeepMind enters EVE Online because puzzles are too easy

Figure robots clean a room, make a bed, and feel weirdly close

Vibe coding’s upside meets the “AI spaghetti” problem

The anti-anxiety AI lesson: skills decay, judgment lasts

Anthropic’s black-box breakthrough might be the biggest story here

Language itself may be biased toward safety

Context, addiction, and early warnings about AI security

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks