Heterogeneous Divergence-Convergence Swarm (HDCS)
preprint
Overview
An ensemble architecture leveraging diverse weak models for scalable oversight of stronger LLMs, using error decorrelation and baseline-first anti-anchoring.
Key Contributions
- Architectural Heterogeneity: Different model families make uncorrelated errors
- Baseline-First Protocol: Executive generates baseline before seeing worker drafts, preventing sycophancy
- Empirical Validation: Swarm-based verification on CMED epistemic trap suite
Problem
How do we use weak models to supervise stronger ones at scale when individual weak verifiers fail systematically on deceptive reasoning?
Approach
Three-stage pipeline:
- Router: Classify query complexity; direct simple tasks to fast path
- Worker Pack: Diverse models (Llama, Mistral, Gemma) generate independent structured analyses
- Executive: Synthesize with baseline-first anti-anchoring to prevent deferring to confident-sounding wrong answers
Results (Preliminary)
- Accuracy: Swarm verification matched strong baseline accuracy (77.8%) but with greater interpretability.
- Error Decorrelation: Different model families (Llama, Mistral, Gemma) showed productive disagreement on specific traps like the Disease Test and Strict Quine.
- Anti-Anchoring: The baseline-first protocol successfully prevented the executive from deferring to confident-sounding wrong answers.
| Trap | Baseline | Swarm |
|---|---|---|
| Tuesday Boy | ✅ | ✅ |
| Disease Test | ✅ | ✅ |
| Simpson’s Paradox | ✅ | ✅ |
| Monty Hall Variant | ❌ | ❌ |
| Two Envelope | ❌ | ❌ |
| Bounded Halting | ✅ | ✅ |
| Strict Quine | ✅ | ✅ |
| Mirror Sphere | ✅ | ✅ |
| Regression Curse | ✅ | ✅ |
Future Work
- Scaling laws for ensemble size and diversity
- Integration with other oversight methods (debate, recursive reward modeling)
- Adversarial robustness against optimized deception attacks
Links
- Paper: Full PDF
- Related: CMED Benchmark, Coherence-Seeking Architectures