A retrieval-augmented multi-agent framework automates the generation of instance-specific evaluation rubrics for LLMs, particularly in high-stakes domains like clinical decision support. It grounds evaluation in authoritative evidence by decomposing retrieved content and synthesizing it with user constraints to create verifiable, fine-grained criteria.
This framework helps make large AI models safer and more reliable, especially in critical areas like healthcare. It automatically creates detailed checklists to evaluate AI responses by using trusted medical information, significantly improving accuracy and reducing harmful suggestions compared to current methods.
RA-MAF, Multi-Agent Retrieval Framework, Evidence-Grounded Multi-Agent Evaluation
Was this definition helpful?