Efficient Transformers

Proof pending

3papers

5.3viability

Proof pending

Proof pending. This topic has not reached the minimum paper threshold yet.

Topic-linked question coverage is still building for this proof surface.

Papers

1-3 of 3

Research Paper·Feb 11, 2026

Retrieval-Aware Distillation for Transformer-SSM Hybrids

State-space models (SSMs) offer efficient sequence modeling but lag behind Transformers on benchmarks that require in-context retrieval. Prior work links this gap to a small set of attention heads, te...

6.0 viability

Research Paper·Feb 5, 2026·B2B

Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) incur prohibitive computational costs due to the quadratic scaling of self-attention. Existing pruning methods fail to simultaneously satisfy differentiability, efficienc...

5.0 viability

Research Paper·Feb 5, 2026·B2B

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction...

5.0 viability

Efficient Transformers

Proof pending

Papers

Retrieval-Aware Distillation for Transformer-SSM Hybrids

Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Filters

Topic proof surfaces

Efficient Transformers

Use this topic page as a durable research-area proof surface