ARXIV:2603.11971 · EMOTION RECOGNITION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

arXiv

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Blocked on Code›Score6.0Evidence unverified

Opportunity summary

Pain A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional…

METHOD

Full abstract

Emotion recognition in in-the-wild video data remains a challenging problem due to large variations in facial appearance, head pose, illumination, background noise, and the inherently dynamic nature of human affect. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional cues. To address this issue, we propose a multimodal emotion recognition framework for the Expression (EXPR) Recognition task in the 10th Affective Behavior Analysis in-the-wild (ABAW) Challenge. Our approach leverages large-scale pre-trained models, namely CLIP for visual encoding and Wav2Vec 2.0 for audio representation learning, as frozen backbone networks. To model temporal dependencies in facial expression sequences, we employ a Temporal Convolutional Network (TCN) over fixed-length video windows. In addition, we introduce a bi-directional cross-attention fusion module, in which visual and audio features interact symmetrically to enhance cross-modal contextualization and capture complementary emotional information. A lightweight classification head is then used for final emotion prediction. We further incorporate a text-guided contrastive objective based on CLIP text features to encourage semantically aligned visual representations. Experimental results on the ABAW 10th EXPR benchmark show that the proposed framework provides a strong multimodal baseline and achieves improved performance over unimodal modeling. These results demonstrate the effectiveness of combining temporal visual modeling, audio representation learning, and cross-modal fusion for robust emotion recognition in unconstrained real-world environments.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. Experimental results on the ABAW 10th EXPR benchmark show that the proposed framework provides a strong multimodal baseline and achieves improved performance over unimodal…

WHY NOW

Emotion Recognition moved forward this cycle; last verified April 2026. Public score 6.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainA multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

ARXIV:2603.11971 · EMOTION RECOGNITION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

arXiv

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Blocked on Code›Score6.0Evidence unverified

Opportunity summary

Pain A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

RESULT

WHY NOW

Emotion Recognition moved forward this cycle; last verified April 2026. Public score 6.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainA multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Paper Pack

10.48550/arXiv.2603.11971

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Derived fallback

Read summaries are estimated from adjacent metadata, not verified extraction rows.

Proof status

unverified

0 refs; 0 sources; 17% coverage.

What was readable

linkedon filenot materializedderived fallback37 indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

6.0

Time to MVP

MVP estimate missing

Commercial

No commercial flags on file

Export

Preparing verified analysis

lens / founder

PROBLEM

METHOD

RESULT

WHY NOW

Emotion Recognition moved forward this cycle; last verified April 2026. Public score 6.0/10.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional cues.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Emotion recognition in in-the-wild video data remains a challenging problem due to large variations in facial appearance, head pose, illumination, background noise, and the inherently dynamic nature of human affect. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional cues.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 6.0/10 on the public viability pass. Experimental results on the ABAW 10th EXPR benchmark show that the proposed framework provides a strong multimodal baseline and achieves improved performance over unimodal modeling.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Emotion Recognition moved forward this cycle; last verified April 2026. Public score 6.0/10.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

Constellation map

Paper-native neighborhood for concepts, methods, materials, markets, and competitors. Missing lanes stay labeled instead of disappearing behind commercialization gates.

Open full Signal Canvas

Concepts

not indexed

Methods

Materials

PDF linked

Markets

Emotion Recognition

Competitors

not indexed

Competitive landscape

A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.

Segment

Emotion Recognition

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Buzz

No indexed public discussion is attached to 2603.11971 yet. That is a visibility signal, not a blank module: the monitor is watching the public channels below.

Hacker News

Not indexed yet

Bluesky

Not indexed yet

PDF

Preview the source document here, or use the hero PDF action for a new tab.

References(37)

From Emotions to Violence: Multimodal Fine-Grained Behavior Analysis at the 9th ABAW

2025D. Kollias, S. Zafeiriou et al.

Advancements in Affective and Behavior Analysis: The 8th ABAW Workshop and Competition

2025Dimitrios Kollias, Panagiotis Tzirakis et al.

DVD: A Comprehensive Dataset for Advancing Violence Detection in Real-World Scenarios

2025D. Kollias, D. C. Senadeera et al.

Emotion Recognition with CLIP and Sequential Learning

2025Weiwei Zhou, Chenkun Ling et al.

Behaviour4All: in-the-wild Facial Behaviour Analysis Toolkit

2024Dimitrios Kollias, Chunchang Shao et al.

Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention

2024Joe Dhanith, Shravan Venkatraman et al.

Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge

2024Josep Cabacas-Maso, Elena Ortega-Beltr'an et al.

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

2024D. Kollias, S. Zafeiriou et al.

Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

2024Wei Zhang, Feng Qiu et al.

The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

2024D. Kollias, Panagiotis Tzirakis et al.

Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

2024D. Kollias, V. Sharmanska et al.

Multi-Label Compound Expression Recognition: C-EXPR Database & Network

2023D. Kollias

Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition

2023Weiwei Zhou, Jiada Lu et al.

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

2023D. Kollias, Panagiotis Tzirakis et al.

ABAW: Learning from Synthetic Data & Multi-Task Learning Challenges

2022D. Kollias

NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition

2022Hanting Li, Ming-Fa Sui et al.

Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild

2022Fuyan Ma, Bin Sun et al.

Continuous Emotion Recognition using Visual-audio-linguistic Information: A Technical Report for ABAW3

2022Su Zhang, Ruyi An et al.

Conditional Prompt Learning for Vision-Language Models

2022Kaiyang Zhou, Jingkang Yang et al.

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Multi-Task Learning Challenges

2022D. Kollias

Showing 20 of 37 references

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkHSEmotion Team at ABAW-10 Competition: Facial Expression Recognition, Valence-Arousal Estimation, Action Unit Detection and Fine-Grained Violence Classification

6.0

Extension

Builds On ThisTeam LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach

4.0

Builds On ThisFoundation Model Embeddings Meet Blended Emotions: A Multimodal Fusion Approach for the BLEMORE Challenge

4.0

Builds On ThisDual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning

5.0

Commercially relevant

Higher ViabilityA Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition

7.0

Higher ViabilitySolution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout

7.0

Higher ViabilityAnchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation

7.0

Higher ViabilityHierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild

7.0

Conflicting

Competing ApproachTeam RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach

4.0

Competing ApproachOrdering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition

3.0

Related Resources

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Agent drawer

5 surfaces preserved for agents. Humans can ignore.

Developer contracts, payload previews, evidence maps, and run controls stay here instead of the Read, Build, and Track workspace.

Run context

Paper: 2603.11971
Route: /paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling
Active tab: read
Artifact: multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

Available agents

Read extractor
Build planner
Track monitor
Competitive mapper
Related-paper scout

API/MCP endpoints

REST paper pack API/api/v1/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling/paper-pack
REST build passport API/api/v1/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling/build-passport
REST OpenAPI/api/openapi.json
MCP descriptor/api/mcp
MCP resourcesciencetostartup://surfaces/paper-workspace

Tool contracts

paper_packbuild_passportopportunity_kernelforesightsource_proofevidence_state

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "656e4479-ddab-4cd8-8125-f86428325637",
  "arxiv_id": "2603.11971",
  "canonical_route": "/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
  "endpoints": {
    "paper_pack": "/api/v1/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling/paper-pack",
    "build_passport": "/api/v1/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Schema validation

paper-r2 contract: present
JSON-LD twin: SSR emitted
OpenAPI path parity: /api/openapi.json
MCP resource parity: paper-workspace

Job trace

queued: drawer opened by user action
running: inspect or copy payload
succeeded: payload available in SSR
failed: route errors appear in evidence cards

Evidence map

sources used: page freshness, source proof anchors, JSON-LD
missing sources: exposed by PaperPack and EvidenceState chips
derived fallbacks: marked unverified before handoff

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

stale

Proof freshness: stale
Proof status: unverified
Display score: 6/10
Last proof check: 2026-04-02
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 17%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

OpenAlex: pending — this preprint is not yet indexed by OpenAlex.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Canonical ID multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling | Route /paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2603.11971"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling",
  "normalized_query": "2603.11971",
  "route": "/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
  "paper_ref": "multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Watch and verify: Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

/buildability/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

Watchwatch

Subject: Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.

Evidence ids

Receipt path

/buildability/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

Paper ref

multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling

arXiv id

2603.11971

Freshness

Generated at

2026-04-02T02:30:40.136Z

Evidence freshness

stale

Last verification

2026-04-02T02:30:40.136Z

Sources

References

Coverage

17%

Hash state

Lineage hash

d3e154e612d0be8b1840ddf28c91e921cfe4e9da7815672f7a00621a98e47f59

Canonical opportunity-kernel lineage hash.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Missing: repo_url
Missing: references
Missing: proof_status
Missing: distribution_readiness_scores
Missing: paper_extraction_scorecards
Unknown: distribution readiness has not been computed yet
Unknown: proof verification has not been recorded yet

Verification pending / evidence receipt incomplete

repo_url

references

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Open receipt API receipt Build Loop Signal Canvas Proof divergence Divergence API Brier outcomes API

Source Proof anchors

Visual citations from the paper document graph.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling#webpage",
      "url": "https://sciencetostartup.com/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
      "name": "Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling",
      "description": "A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling#scholarlyArticle",
      "headline": "Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling",
      "description": "A multimodal framework for robust emotion recognition in video data using cross-attention and temporal modeling.",
      "url": "https://sciencetostartup.com/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling",
      "sameAs": "https://arxiv.org/abs/2603.11971",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2603.11971"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-03-12T14:20:29.000Z",
      "citation": [
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "cb7e32527530b01dc4402127b42eedd2ae5b8e8f"
          },
          "url": "https://www.semanticscholar.org/paper/cb7e32527530b01dc4402127b42eedd2ae5b8e8f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3bc4d2c1291ba2dc9c1b8b2f3a16f5cc6517975b"
          },
          "url": "https://www.semanticscholar.org/paper/3bc4d2c1291ba2dc9c1b8b2f3a16f5cc6517975b"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "a9ef03e616222795d3c5f20164e2d67b8371325e"
          },
          "url": "https://www.semanticscholar.org/paper/a9ef03e616222795d3c5f20164e2d67b8371325e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "15ce743750eeabccc3eaf86fbc6d93a790a2ea33"
          },
          "url": "https://www.semanticscholar.org/paper/15ce743750eeabccc3eaf86fbc6d93a790a2ea33"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "7fd89f02f9b5caa9cfbb6b1301b8d88fff373d3f"
          },
          "url": "https://www.semanticscholar.org/paper/7fd89f02f9b5caa9cfbb6b1301b8d88fff373d3f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5d1d131137ba166c50e98779142cfac876e0683b"
          },
          "url": "https://www.semanticscholar.org/paper/5d1d131137ba166c50e98779142cfac876e0683b"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "23facc07f4ee2627697eeaa3be336250775bb440"
          },
          "url": "https://www.semanticscholar.org/paper/23facc07f4ee2627697eeaa3be336250775bb440"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "7f00de83fe8527dc157eadd6427b0e33c797d7d9"
          },
          "url": "https://www.semanticscholar.org/paper/7f00de83fe8527dc157eadd6427b0e33c797d7d9"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "4c68b2b3479c4bfaab45986b4409b672b006d327"
          },
          "url": "https://www.semanticscholar.org/paper/4c68b2b3479c4bfaab45986b4409b672b006d327"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "bf98a07a70121ec89816515c3d8fd8394acc531e"
          },
          "url": "https://www.semanticscholar.org/paper/bf98a07a70121ec89816515c3d8fd8394acc531e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "16d26f3adaf77d477b60b96081c488ae26abb730"
          },
          "url": "https://www.semanticscholar.org/paper/16d26f3adaf77d477b60b96081c488ae26abb730"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "06778541d71c8f7edb7bc9ac9e6bdd193f78f7f3"
          },
          "url": "https://www.semanticscholar.org/paper/06778541d71c8f7edb7bc9ac9e6bdd193f78f7f3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "801ee01e63412f9d3412209099dc73510badc0f7"
          },
          "url": "https://www.semanticscholar.org/paper/801ee01e63412f9d3412209099dc73510badc0f7"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "c261f665804163d91535c5dd37a579fca346ee00"
          },
          "url": "https://www.semanticscholar.org/paper/c261f665804163d91535c5dd37a579fca346ee00"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "7a177d931bedde0dca38f03e79dba01ac6545f13"
          },
          "url": "https://www.semanticscholar.org/paper/7a177d931bedde0dca38f03e79dba01ac6545f13"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "98591ed0db3b2dcb71916d71a7efb422d0b0232e"
          },
          "url": "https://www.semanticscholar.org/paper/98591ed0db3b2dcb71916d71a7efb422d0b0232e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "84ec918a6bd47f77c0d58b20227dffac12427e6a"
          },
          "url": "https://www.semanticscholar.org/paper/84ec918a6bd47f77c0d58b20227dffac12427e6a"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "9b7ab6491db5de0358b93e164ebfed768b72168a"
          },
          "url": "https://www.semanticscholar.org/paper/9b7ab6491db5de0358b93e164ebfed768b72168a"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "b879450f50a6113f44a5baf0bcd5b4331eeb7bbc"
          },
          "url": "https://www.semanticscholar.org/paper/b879450f50a6113f44a5baf0bcd5b4331eeb7bbc"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "cc340c9af8aa4ac700a7cb73df01b7ca6409705c"
          },
          "url": "https://www.semanticscholar.org/paper/cc340c9af8aa4ac700a7cb73df01b7ca6409705c"
        }
      ],
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 6
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "Emotion Recognition"
        }
      ]
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Emotion Recognition",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "Multimodal Emotion Recognition via Bi-directional Cross-Atte",
          "item": "https://sciencetostartup.com/paper/multimodal-emotion-recognition-via-bi-directional-cross-attention-and-temporal-modeling"
        }
      ]
    }
  ]
}

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(37)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(37)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline