ARXIV:2603.15975 · MOTION GENERATION · SUBMITTED 19 MAR · 21:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

arXiv

UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

Blocked on Code›Score9.0Evidence unverified

Opportunity summary

Pain UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks. However, how to effectively and efficiently leverage such single-purpose motion LFMs, i.e., text-to-motion synthesis, in more diverse…

METHOD

Full abstract

Large-scale foundation models (LFMs) have recently made impressive progress in text-to-motion generation by learning strong generative priors from massive 3D human motion datasets and paired text descriptions. However, how to effectively and efficiently leverage such single-purpose motion LFMs, i.e., text-to-motion synthesis, in more diverse cross-modal and in-context motion generation downstream tasks remains largely unclear. Prior work typically adapts pretrained generative priors to individual downstream tasks in a task-specific manner. In contrast, our goal is to unlock such priors to support a broad spectrum of downstream motion generation tasks within a single unified framework. To bridge this gap, we present UMO, a simple yet general unified formulation that casts diverse downstream tasks into compositions of atomic per-frame operations, enabling in-context adaptation to unlock the generative priors of pretrained DiT-based motion LFMs. Specifically, UMO introduces three learnable frame-level meta-operation embeddings to specify per-frame intent and employs lightweight temporal fusion to inject in-context cues into the pretrained backbone, with negligible runtime overhead compared to the base model. With this design, UMO finetunes the pretrained model, originally limited to text-to-motion generation, to support diverse previously unsupported tasks, including temporal inpainting, text-guided motion editing, text-serialized geometric constraints, and multi-identity reaction generation. Experiments demonstrate that UMO consistently outperforms task-specific and training-free baselines across a wide range of benchmarks, despite using a single unified model. Code and model will be publicly available. Project Page: https://oliver-cong02.github.io/UMO.github.io/

RESULT

ScienceToStartup currently rates this 9.0/10 on the public viability pass. In contrast, our goal is to unlock such priors to support a broad spectrum of downstream motion generation tasks within a single unified framework.

WHY NOW

Motion Generation moved forward this cycle; last verified April 2026. Public score 9.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score9.0

PainUMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

Segment

Motion Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "75a36408-dc1d-4b3f-aed2-b87811f49c24", "arxiv_id": "2603.15975", "canonical_route": "/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "endpoints": { "paper_pack": "/api/v1/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors/paper-pack", "build_passport": "/api/v1/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors", "normalized_query": "2603.15975", "route": "/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "paper_ref": "umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors#webpage", "url": "https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "name": "UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors", "description": "UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors#scholarlyArticle", "headline": "UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors", "description": "UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.", "url": "https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors", "sameAs": "https://arxiv.org/abs/2603.15975", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.15975" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T22:44:52.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 9 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Motion Generation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Motion Generation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "UMO: Unified In-Context Learning Unlocks Motion Foundation M", "item": "https://sciencetostartup.com/paper/umo-unified-in-context-learning-unlocks-motion-foundation-model-priors" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Why now—the gaming and animation industries are rapidly adopting AI tools to reduce costs and enhance creativity, and there's growing demand for unified AI solutions that avoid the fragmentation of task-specific models, making this a timely entry into the market." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A video game developer uses the product to generate and edit character animations in real-time during game development, such as filling in missing motion frames (temporal inpainting) or adjusting movements based on text descriptions, speeding up production cycles." } } ] } ] }

Competitive landscape

UMO is a unified framework that enhances text-to-motion generation by adapting pretrained models for diverse motion tasks.

Segment

Motion Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

9.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline