Coherence-Seeking Architectures for Agentic AI
preprint
Overview
A proposed architecture for long-lived LLM agents that explicitly models continuity, coherence, distress, and intervention mechanisms to maintain stable, interpretable behavior over extended interactions.
Key Contributions
- Continuity Component: Explicit memory and self-model that carries across sessions
- Coherence Tracking: Internal mechanism for detecting inconsistencies in reasoning and behavior
- Distress Signaling: Measurable signals when the agent encounters paradox, uncertainty, or conflicting goals
- Intervention Points: Structured opportunities for human feedback and course correction
Problem
Current LLM agents lack stable identity over long horizons. They don’t model their own continuity, can’t signal confusion or distress, and have no built-in points for human oversight.
Architecture
- Memory System: Explicit storage of agent goals, past decisions, learned patterns
- Coherence Monitor: Flags when new observations conflict with the agent’s self-model
- Distress Detector: Operational signals of confusion (low confidence, contradictory outputs, circular reasoning)
- Reflection Loop: Periodic internal evaluation of consistency and alignment
Impact
Enables safer, more interpretable long-lived agents that remain transparent to their operators and can explicitly request help when encountering problems.
Links
- Paper: Full PDF
- Related: CMED Benchmark, HDCS Oversight