AI EngineerJune 25, 202658m

The Miranda Hypothesis: How Hamilton Poisoned Persona Evals - Jacob E. Thomas, Results Gen

TL;DR

The Miranda Hypothesis: Culturally dominant representations (like the Hamilton musical) can overwrite a historical figure's documentary record in a model's training corpus, producing a fluent but anachronistic composite persona.
Convincingness and fidelity are independent: A system can score 80.7% on personality fidelity evals while reasoning from knowledge the historical figure never possessed, as shown by a Lincoln who invokes 20th-century executive power doctrines in an 1847 context.
Post-training reinforcement amplifies the problem: Human raters evaluate outputs through their own culturally shaped frameworks, so RLHF rewards the composite the user already believes in, not the documentary record.
The fix is epistemic simulation: Replace cognitive simulation (internal personality modeling) with corpus-bounded, temporally anchored personas, evaluated by domain experts against primary documents, not fluency metrics.
Context window architecture beats fine-tuning: Fine-tuning dissolves documents into parameters, breaking provenance and auditability; context windows preserve documents as inspectable, returnable sources, making the system both more ethical and more debuggable.
The pre-registered Lincoln experiment: Four moments across Lincoln's life, three seeding conditions (bare model, biography, primary sources), five diagnostic questions scored on anachronism detection (40%), documentary consistency (35%), and contextual plausibility (25%), with voice deliberately excluded as an eval axis.

The Breakdown

If your persona eval measures fluency and personality consistency, it cannot detect the dominant failure mode: anachronistic compositing, where a model like Hamilton sounds like his own Broadway musical instead of the historical figure.