The Runtime Mixture-of-Models (MoM) architecture, exemplified by the N-Way Self-Evaluating Deliberation (NSED) protocol, represents a paradigm shift in how complex AI tasks are tackled. Unlike traditional Mixture-of-Experts (MoE) systems that use static gating, MoM dynamically constructs composite models from diverse expert agents during runtime. Its core mechanism involves a Dynamic Expertise Broker, an optimization engine that treats model selection as a Knapsack Problem, binding heterogeneous model checkpoints to specific roles based on real-time telemetry and cost constraints. This approach formalizes deliberation as a Macro-Scale Recurrent Neural Network, allowing iterative refinement through a semantic forget gate without escalating VRAM demands. MoM is crucial for enabling ensembles of smaller, consumer-grade models (under 20B parameters) to achieve or surpass the performance of much larger, state-of-the-art models (100B+ parameters), making high-performance AI more accessible and efficient. It finds application in challenging benchmarks like AIME and LiveCodeBench, pushing the boundaries of complex reasoning and code generation.
Runtime Mixture-of-Models (MoM) is a new AI system that intelligently combines several smaller AI models on the fly to solve complex problems. Instead of using one giant model, it picks and chooses the best small models for each part of a task, allowing it to perform as well as or better than much larger, more expensive AI systems.
MoM, NSED, Dynamic Expertise Broker architecture
Was this definition helpful?