back-flow of distinguishability

Gold definitionUpdated Apr 2, 2026

Definition

Back-flow of distinguishability is a model-agnostic witness of training memory in neural networks, quantifying non-Markovianity by measuring the increased distinguishability of model outcomes after sequential interventions. It provides a principled diagnostic for understanding how optimizer and data states influence training dynamics.

At a glance

Executive summary

Back-flow of distinguishability is a new way to measure how much "memory" a neural network's training process has, showing that training isn't always a simple step-by-step process. It helps researchers understand how past training decisions, like optimizer settings or data batches, continue to influence future learning.

TL;DR

It's a method to detect and quantify how much a neural network's training "remembers" its past steps, proving training isn't always a fresh start.

Key points

Quantifies non-Markovianity in neural training by measuring the increase in distinguishability of model outcomes after sequential interventions.
Provides a model-agnostic, inexpensive witness and principled diagnostic for training memory in practical Stochastic Gradient Descent (SGD).
Used by researchers and ML engineers studying neural network optimization dynamics, memory effects, and training stability.
Unlike inferring memory from indirect performance metrics, it directly certifies and quantifies non-Markovianity through outcome distribution comparisons.
Reflects a growing research trend in understanding the complex, non-Markovian dynamics and memory effects inherent in deep learning optimization processes.

Use cases

Diagnosing the extent of optimizer memory in novel training algorithms or custom optimizers.
Comparing how different batching strategies (e.g., larger overlap vs. no overlap) influence the non-Markovian nature of training.
Investigating the impact of various data augmentation techniques on the memory retention within the training process.
Analyzing the role of momentum parameters in amplifying or diminishing training memory effects across different tasks.

Also known as

BFD, training memory witness, non-Markovianity witness