Making Minds

Heterogeneous Divergence-Convergence Swarm (HDCS)

preprint

Overview

An ensemble architecture leveraging diverse weak models for scalable oversight of stronger LLMs, using error decorrelation and baseline-first anti-anchoring.

Key Contributions

  • Architectural Heterogeneity: Different model families make uncorrelated errors
  • Baseline-First Protocol: Executive generates baseline before seeing worker drafts, preventing sycophancy
  • Empirical Validation: Swarm-based verification on CMED epistemic trap suite

Problem

How do we use weak models to supervise stronger ones at scale when individual weak verifiers fail systematically on deceptive reasoning?

Approach

Three-stage pipeline:

  1. Router: Classify query complexity; direct simple tasks to fast path
  2. Worker Pack: Diverse models (Llama, Mistral, Gemma) generate independent structured analyses
  3. Executive: Synthesize with baseline-first anti-anchoring to prevent deferring to confident-sounding wrong answers

Results (Preliminary)

  • Accuracy: Swarm verification matched strong baseline accuracy (77.8%) but with greater interpretability.
  • Error Decorrelation: Different model families (Llama, Mistral, Gemma) showed productive disagreement on specific traps like the Disease Test and Strict Quine.
  • Anti-Anchoring: The baseline-first protocol successfully prevented the executive from deferring to confident-sounding wrong answers.
TrapBaselineSwarm
Tuesday Boy
Disease Test
Simpson’s Paradox
Monty Hall Variant
Two Envelope
Bounded Halting
Strict Quine
Mirror Sphere
Regression Curse

Future Work

  • Scaling laws for ensemble size and diversity
  • Integration with other oversight methods (debate, recursive reward modeling)
  • Adversarial robustness against optimized deception attacks