ARXIV:2604.13432 · VISION TRANSFORMERS · SUBMITTED 16 APR · 18:18 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis

Simin Huo · Ning Li · arXiv

A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

Evidence 0 refs | 5 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations. Existing methods, such as ToMe, rely on GPU-inefficient operations (e.g., sorting, scattered writes), introducing…

METHOD

Full abstract

Token compression is crucial for mitigating the quadratic complexity of self-attention mechanisms in Vision Transformers (ViTs), which often involve numerous input tokens. Existing methods, such as ToMe, rely on GPU-inefficient operations (e.g., sorting, scattered writes), introducing overheads that limit their effectiveness. We introduce MaMe, a training-free, differentiable token merging method based entirely on matrix operations, which is GPU-friendly to accelerate ViTs. Additionally, we present MaRe, its inverse operation, for token restoration, forming a MaMe+MaRe pipeline for image synthesis. When applied to pre-trained models, MaMe doubles ViT-B throughput with a 2% accuracy drop. Notably, fine-tuning the last layer with MaMe boosts ViT-B accuracy by 1.0% at 1.1x speed. In SigLIP2-B@512 zero-shot classification, MaMe provides 1.3x acceleration with negligible performance degradation. In video tasks, MaMe accelerates VideoMAE-L by 48.5% on Kinetics-400 with only a 0.84% accuracy loss. Furthermore, MaMe achieves simultaneous improvements in both performance and speed on some tasks. In image synthesis, the MaMe+MaRe pipeline enhances quality while reducing Stable Diffusion v2.1 generation latency by 31%. Collectively, these results demonstrate MaMe's and MaRe's effectiveness in accelerating vision models. The code is available at https://github.com/cominder/mame}{https://github.com/cominder/mame.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Furthermore, MaMe achieves simultaneous improvements in both performance and speed on some tasks. A public repository is linked, so build verification can inspect implementation…

WHY NOW

Vision Transformers moved forward this cycle; last verified April 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainA GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

Evidence0 refs | 5 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

Segment

Vision Transformers

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a9020aa3-986c-48fd-ab50-1eeb4c950ac2", "arxiv_id": "2604.13432", "canonical_route": "/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "endpoints": { "paper_pack": "/api/v1/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis/paper-pack", "build_passport": "/api/v1/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis", "normalized_query": "2604.13432", "route": "/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "paper_ref": "mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis#webpage", "url": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "name": "MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis", "description": "A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis#scholarlyArticle", "headline": "MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis", "description": "A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.", "url": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis", "sameAs": "https://arxiv.org/abs/2604.13432", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.13432" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-15T03:06:24.000Z", "author": [ { "@type": "Person", "name": "Simin Huo" }, { "@type": "Person", "name": "Ning Li" } ], "codeRepository": "https://github.com/cominder/mame", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision Transformers" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis#software", "name": "MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis - Source Code", "description": "A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.", "codeRepository": "https://github.com/cominder/mame", "url": "https://github.com/cominder/mame" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision Transformers", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MaMe & MaRe: Matrix-Based Token Merging and Restoration for ", "item": "https://sciencetostartup.com/paper/mame-mare-matrix-based-token-merging-and-restoration-for-efficient-visual-perception-and-synthesis" } ] } ] }

Competitive landscape

A GPU-friendly, training-free method for accelerating Vision Transformers and enhancing image synthesis by merging and restoring tokens using matrix operations.

Segment

Vision Transformers

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis

MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline