An inference-time attention calibration method redistributes attention more evenly across document positions in embedding models. This mitigates positional bias, where early segments are over-represented, thereby increasing the discoverability of later segments in long documents.
This method fixes a problem where AI models for search often ignore the middle or end parts of long documents. It works by making sure the model pays equal attention to all parts of a document, not just the beginning, so everything is discoverable.
attention redistribution, positional bias mitigation, fair attention
Was this definition helpful?