TAM-Eval (Test Automated Maintenance Evaluation) is a framework and benchmark designed to rigorously evaluate Large Language Models (LLMs) across the full spectrum of test suite maintenance tasks: creation, repair, and updating. It operates at the test file level with full repository context, reflecting real-world software engineering workflows.
TAM-Eval is a new tool for testing how well AI models (LLMs) can help maintain software test suites, covering tasks like creating, fixing, and updating tests. It's designed to be more realistic than previous methods by looking at whole test files within a project. Initial results show current LLMs struggle with these complex tasks.
Test Automated Maintenance Evaluation
Was this definition helpful?