Heterogeneous Divergence-Convergence Swarm (HDCS)

Overview

An ensemble architecture leveraging diverse weak models for scalable oversight of stronger LLMs, using error decorrelation and baseline-first anti-anchoring.

Key Contributions

Architectural Heterogeneity: Different model families make uncorrelated errors
Baseline-First Protocol: Executive generates baseline before seeing worker drafts, preventing sycophancy
Empirical Validation: Swarm-based verification on CMED epistemic trap suite

Problem

How do we use weak models to supervise stronger ones at scale when individual weak verifiers fail systematically on deceptive reasoning?

Approach

Three-stage pipeline:

Router: Classify query complexity; direct simple tasks to fast path
Worker Pack: Diverse models (Llama, Mistral, Gemma) generate independent structured analyses
Executive: Synthesize with baseline-first anti-anchoring to prevent deferring to confident-sounding wrong answers

Results (Preliminary)

Accuracy: Swarm verification matched strong baseline accuracy (77.8%) but with greater interpretability.
Error Decorrelation: Different model families (Llama, Mistral, Gemma) showed productive disagreement on specific traps like the Disease Test and Strict Quine.
Anti-Anchoring: The baseline-first protocol successfully prevented the executive from deferring to confident-sounding wrong answers.

Trap	Baseline	Swarm
Tuesday Boy	✅	✅
Disease Test	✅	✅
Simpson’s Paradox	✅	✅
Monty Hall Variant	❌	❌
Two Envelope	❌	❌
Bounded Halting	✅	✅
Strict Quine	✅	✅
Mirror Sphere	✅	✅
Regression Curse	✅	✅

Future Work

Scaling laws for ensemble size and diversity
Integration with other oversight methods (debate, recursive reward modeling)
Adversarial robustness against optimized deception attacks