ARXIV:2603.11698 · TEXT-TO-VIDEO GENERATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

arXiv

OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

OSCBench is a new benchmark for evaluating object state change in text-to-video generation models. However, existing benchmarks primarily focus on perceptual quality, text-video alignment, or physical plausibility, leaving a critical aspect of action understanding…

METHOD

Full abstract

Text-to-video (T2V) generation models have made rapid progress in producing visually high-quality and temporally coherent videos. However, existing benchmarks primarily focus on perceptual quality, text-video alignment, or physical plausibility, leaving a critical aspect of action understanding largely unexplored: object state change (OSC) explicitly specified in the text prompt. OSC refers to the transformation of an object's state induced by an action, such as peeling a potato or slicing a lemon. In this paper, we introduce OSCBench, a benchmark specifically designed to assess OSC performance in T2V models. OSCBench is constructed from instructional cooking data and systematically organizes action-object interactions into regular, novel, and compositional scenarios to probe both in-distribution performance and generalization. We evaluate six representative open-source and proprietary T2V models using both human user study and multimodal large language model (MLLM)-based automatic evaluation. Our results show that, despite strong performance on semantic and scene alignment, current T2V models consistently struggle with accurate and temporally consistent object state changes, especially in novel and compositional settings. These findings position OSC as a key bottleneck in text-to-video generation and establish OSCBench as a diagnostic benchmark for advancing state-aware video generation models.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Our results show that, despite strong performance on semantic and scene alignment, current T2V models consistently struggle with accurate and temporally consistent object state…

WHY NOW

Text-to-Video Generation moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainOSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

Segment

Text-to-Video Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "6a40187e-4f0b-4c1e-8dbd-de27ca34ccca", "arxiv_id": "2603.11698", "canonical_route": "/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "oscbench-benchmarking-object-state-change-in-text-to-video-generation", "endpoints": { "paper_pack": "/api/v1/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation/paper-pack", "build_passport": "/api/v1/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "OSCBench: Benchmarking Object State Change in Text-to-Video Generation", "normalized_query": "2603.11698", "route": "/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation", "paper_ref": "oscbench-benchmarking-object-state-change-in-text-to-video-generation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation#webpage", "url": "https://sciencetostartup.com/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation", "name": "OSCBench: Benchmarking Object State Change in Text-to-Video Generation", "description": "OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation#scholarlyArticle", "headline": "OSCBench: Benchmarking Object State Change in Text-to-Video Generation", "description": "OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.", "url": "https://sciencetostartup.com/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation", "sameAs": "https://arxiv.org/abs/2603.11698", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.11698" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-12T09:08:01.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Text-to-Video Generation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Text-to-Video Generation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "OSCBench: Benchmarking Object State Change in Text-to-Video ", "item": "https://sciencetostartup.com/paper/oscbench-benchmarking-object-state-change-in-text-to-video-generation" } ] } ] }

Competitive landscape

OSCBench is a new benchmark for evaluating object state change in text-to-video generation models.

Segment

Text-to-Video Generation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline