StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles | ScienceToStartup | ScienceToStartup

PDF Viewer

100%

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

CursorIDE

AI-first code editor built on VS Code.

VS CodeIDE

Free, open-source editor by Microsoft.

Recommended Stack

Apache SparkData Processing

PolarsData

dbtData Transform

ElasticsearchSearch

Apache KafkaStreaming

Startup Essentials

Render

Deploy Backend

Railway

Full-Stack Deploy

Supabase

Backend & Auth

Vercel

Deploy Frontend

Firebase

Google Backend

Hugging Face Hub

ML Model Hub

Banana.dev

GPU Inference

Antigravity

AI Agent IDE

MVP Investment

$9K - $12K

6-10 weeks

Engineering

$8,000

Cloud Hosting

$240

SaaS Stack

$300

Domain & Legal

$100

6mo ROI

2-4x

3yr ROI

10-20x

Lightweight AI tools can reach profitability quickly. At $500/mo average contract, 20 customers = $10K MRR by 6mo, 200+ by 3yr.

Talent Scout

Daniel Oliveira

INESC-ID Lisboa

David Martins de Matos

Instituto Superior Técnico, Universidade de Lisboa

Find Similar Experts

Dataset experts on LinkedIn & GitHub

References (23)

[1]

GroundCap: A Visually Grounded Image Captioning Dataset

2025Daniel A. P. Oliveira, Lourencco Teodoro et al.

[2]

CHATTER: A Character Attribution Dataset for Narrative Understanding

2024Sabyasachee Baruah, Shrikanth S. Narayanan

[3]

Character-aware audio-visual subtitling in context

Founder's Pitch

"A tool for aligning and enhancing visual storytelling with movie script-grounded narrative to reduce hallucination errors."

Dataset Creation•Score: 5•View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

3/4 signals

7.5

Quick Build

4/4 signals

Series A Potential

3/4 signals

7.5

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 4/2/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research tackles the common issue in visual storytelling of semantic inconsistency and hallucinations by integrating precise narrative context from movie scripts and subtitles, thereby enhancing the accuracy and authenticity of generated narratives.

Product Angle

The solution can be packaged as an API that film and media production companies integrate into pre- and post-production processes to enhance script consistency and reduce errors, leading to cleaner narrative delivery.

Disruption

This replaces existing manual script editing and continuity management by automating the semantic synchronization of visual and narrative content, minimizing human error.

Product Opportunity

The media and entertainment industry, valued at over $100 billion annually, often faces challenges with script continuity and narrative consistency. Production companies will use this tool to ensure accuracy, thereby saving costs associated with post-production editing due to narrative errors.

Use Case Idea

Develop a script-writing assistant for filmmakers that ensures character interactions and dialogues are portrayed accurately, improving production efficiency in aligning visual scenes with the script.

Science

This study introduces the StoryMovie dataset, which aligns visual storytelling data with movie scripts and subtitles to improve semantic accuracy. Their method synchronizes dialogue from movie scripts with subtitle timing for accurate dialogue attribution, leveraging Longest Common Subsequence (LCS) for token matching. It enhances a storytelling model by grounding stories in detailed context taken directly from scripts, reducing semantic errors by using information beyond visual cues.

Method & Eval

Using the StoryMovie dataset, the model was tested for its semantic alignment capabilities. Evaluation showed improved dialogue attribution and entity re-identification, achieving a 48.5% win rate over models without script grounding.

Caveats

The model's alignment process depends heavily on the quality of available scripts and subtitles, which might not always be accessible for all movies. Furthermore, it is susceptible to misalignment issues in poorly transcribed scripts/subtitles.

Author Intelligence

Daniel Oliveira

LEAD

INESC-ID Lisboa

daniel.oliveira@inesc-id.pt

David Martins de Matos

Instituto Superior Técnico, Universidade de Lisboa

david.matos@inesc-id.pt

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

BUILDER'S SANDBOX

Build This Paper

Recommended Stack

Startup Essentials

MVP Investment

Talent Scout

References (23)

Founder's Pitch

"A tool for aligning and enhancing visual storytelling with movie script-grounded narrative to reduce hallucination errors."

Commercial Viability Breakdown

🔭 Research Neighborhood

Why It Matters

Product Angle

Disruption

Product Opportunity

Use Case Idea

Science

Method & Eval

Caveats

Author Intelligence

Daniel Oliveira

David Martins de Matos

Related Papers