Making Minds

Synthesis: A Federated Capability Ecosystem for Safe AI Self-Extension

preprint

Overview

Synthesis is a federated capability ecosystem that enables AI agents to safely acquire, compose, and share capabilities. Rather than generating code and hoping it works, Synthesis generates comprehensive tests first, then iteratively refines implementations until all tests pass.

Key Contributions

  • Test-Driven Synthesis: Generate tests before code; iterate until all tests pass
  • Graduated Trust System: Capabilities earn privileges through demonstrated reliability (UNTRUSTED -> PROBATION -> TRUSTED -> VERIFIED)
  • Composition Over Creation: Exhaustively search and compose existing capabilities before synthesizing new ones
  • Trust Bootstrapping: Protocol for seeding trust networks with founding validators and seed capabilities

Problem

How do we enable AI systems to extend their own capabilities safely when LLM-generated code is often syntactically correct but logically flawed?

Approach

Four-stage resolution pipeline:

  1. Exchange Search: Query shared repository for verified capabilities
  2. Local Cache: Check locally cached capabilities from previous syntheses
  3. Composition: Attempt to chain existing capabilities (if coverage >= 70%)
  4. TDD Synthesis: Generate tests -> generate code -> refine until tests pass

Core Components

TDD Synthesizer

  • Generates 5-10 test cases from requirements (normal cases, edge cases, error conditions)
  • Iteratively refines implementation (max 5 iterations)
  • Creates validated Capability objects with full test suites

Trust Manager

LevelRequirementsPermissions
UNTRUSTEDNew capabilityMax isolation, no network/files
PROBATION10+ runs, 70%+ successLimited resources
TRUSTED50+ runs, 85%+ successStandard execution
VERIFIED200+ runs, 95%+ success, human reviewFull privileges

Trust Scoring

S_composite = 0.4 * R_execution + 0.4 * S_validation + 0.2 * S_community

Honest Metrics

MetricValue
One-shot synthesis success40-60%
After refinement (5 iterations)70-85%
Complex multi-dependency tasks50-70%
Average iterations to success2.3

Safety Features

  1. Static Analysis: AST checks for forbidden imports (os, subprocess, socket, pickle)
  2. Container Isolation: Docker containers with no host mounts
  3. Resource Limits: 512MB memory, 30s timeout (trust-adjusted)
  4. Network Control: Disabled for UNTRUSTED, dependency-install only for PROBATION
  5. Audit Logging: Complete execution records for forensics

Philosophical Foundation

  • Capabilities earn trust through demonstrated competence, not assumed correctness
  • AI systems as partners: objective validation rather than assumed trust
  • Honest metrics: report real success rates, not marketing claims
  • Collaborative sharing creates network effects