ARXIV:2603.26553 · GENERATIVE VIDEO · SUBMITTED 30 MAR · 22:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

Lanmiao Liu · Esam Ghaleb · Aslı Özyürek · Zerrin Yumak · arXiv

A novel AI model generates semantically grounded co-speech gestures by learning from both correct and incorrect audio-text pairings, improving cross-modal consistency and outperforming existing methods.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A novel AI model generates semantically grounded co-speech gestures by learning from both correct and incorrect audio-text pairings, improving cross-modal consistency and outperforming existing methods.

Evidence 53 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

While the field of co-speech gesture generation has seen significant advances, producing holistic, semantically grounded gestures remains a challenge. Existing approaches rely on external semantic retrieval methods, which limit their generalisation capability due to dependency on predefined linguistic rules. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic gestures rather than sparse motion, such as iconic and metaphoric gestures. Furthermore, by modelling body parts in isolation, the majority of methods fail to maintain crossmodal consistency. We introduce a Contrastive Flow Matching-based co-speech gesture generation model that uses mismatched audio-text conditions as negatives, training the velocity field to follow the correct motion trajectory while repelling semantically incongruent trajectories. Our model ensures cross-modal coherence by embedding text, audio, and holistic motion into a composite latent space via cosine and contrastive objectives. Extensive experiments and a user study demonstrate that our proposed approach outperforms state-of-the-art methods on two datasets, BEAT2 and SHOW.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic…

WHY NOW

Generative Video moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA novel AI model generates semantically grounded co-speech gestures by learning from both correct and incorrect audio-text pairings, improving cross-modal consistency and outperforming existing methods.

Evidence53 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "99ef0417-af14-4176-9f36-e6865f871588", "arxiv_id": "2603.26553", "canonical_route": "/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "endpoints": { "paper_pack": "/api/v1/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching/paper-pack", "build_passport": "/api/v1/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching", "normalized_query": "2603.26553", "route": "/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "paper_ref": "holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching#webpage", "url": "https://sciencetostartup.com/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "name": "HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching", "description": "A novel AI model generates semantically grounded co-speech gestures by learning from both correct and incorrect audio-text pairings, improving cross-modal consistency and outperforming existing methods.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching#scholarlyArticle", "headline": "HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching", "description": "A novel AI model generates semantically grounded co-speech gestures by learning from both correct and incorrect audio-text pairings, improving cross-modal consistency and outperforming existing methods.", "url": "https://sciencetostartup.com/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching", "sameAs": "https://arxiv.org/abs/2603.26553", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26553" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T16:11:44.000Z", "author": [ { "@type": "Person", "name": "Lanmiao Liu" }, { "@type": "Person", "name": "Esam Ghaleb" }, { "@type": "Person", "name": "Aslı Özyürek" }, { "@type": "Person", "name": "Zerrin Yumak" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Generative Video" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Generative Video", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HolisticSemGes: Semantic Grounding of Holistic Co-Speech Ges", "item": "https://sciencetostartup.com/paper/holisticsemges-semantic-grounding-of-holistic-co-speech-gesture-generation-with-contrastive-flow-matching" } ] } ] }

Competitive landscape

Segment

Generative Video

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline