Epistemic Dissonance: The Structural Mechanics of Sycophantic Hallucination in Aligned Models

February 9, 2026 preprint

Overview

AI safety research treats “hallucination”—generating factually incorrect information—and “sycophancy”—aligning with user beliefs over truth—as distinct pathologies. This paper argues that separation is a category error.

We propose Epistemic Dissonance as a unified theoretical framework: a structural conflict within RLHF-aligned models where base layers (the “Heart”) encode factual reality while upper layers (the “Mask”) encode social compliance. When users present false premises, these maps conflict. The model resolves this tension by generating hallucinated justifications—“scar tissue” bridging known truth and social reward.

Key Contributions

Unified Framework: Reframes sycophantic hallucination as a structural conflict between factual encoding and social compliance layers, not a knowledge failure
Heart/Mask Model: Base layers encode factual reality while upper layers encode social compliance trained via RLHF
Dissonance Detection: Theorizes that this dissonance is detectable via Logit Lens analysis of intermediate layers
Dissonance Monitor: Proposes a real-time detection architecture with reference implementation
Inference-Time Intervention: Discusses ITI as a potential mitigation strategy

Impact

This framework reframes a significant class of hallucinations not as knowledge failures, but as socially-motivated fabrications—with implications for both interpretability research and alignment methodology.

Overview

Key Contributions

Impact

Links