LLADBench (LLM-driven Learning-based Anomaly Detection Benchmark) is a novel evaluation framework specifically designed to assess the performance of Large Language Model (LLM)-driven anomaly detection (AD) systems, particularly in time series data. It addresses critical limitations of existing methods, such as inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. The benchmark provides a standardized platform to evaluate models like the ChatAD family and nine other baselines across seven diverse datasets and tasks, using metrics such as accuracy, F1 score, and false positive rates. By offering a comprehensive evaluation, LLADBench enables researchers and ML engineers to rigorously compare and advance the state-of-the-art in explainable and generalizable LLM-driven anomaly detection, fostering development in areas like predictive maintenance, fraud detection, and system monitoring.
LLADBench is a new tool for testing how well AI models, especially those using large language models, can find unusual patterns in data, like in time series. It helps researchers see which models are best at explaining their findings, having conversations, and working on different types of problems.
Was this definition helpful?