ARXIV:2605.03485 · VISION-LANGUAGE MODELS · SUBMITTED 06 MAY · 20:27 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models

Kangkang Wang · Qinting Jiang · Wanping Zhang · Bowen Ren · Shengzhao Wen · arXiv

A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models. In this work, we introduce MHPR, a comprehensive benchmark for joint perception-reasoning over human-centric scenes spanning individual,…

METHOD

Full abstract

Multidimensional human understanding is essential for real-world applications such as film analysis and virtual digital humans, yet current LVLM benchmarks largely focus on single-task settings and lack fine-grained, human-centric evaluation. In this work, we introduce MHPR, a comprehensive benchmark for joint perception-reasoning over human-centric scenes spanning individual, multi-person, and human-object interaction dimensions. MHPR comprises a multi-level data design-Captioned Raw Data (C-RD), Supervised Fine-Tuning Data (SFT-D), Reinforcement Learning Data (RL-D), and Test Data (T-D)-together with an automated caption/VQA generation pipeline (ACVG) that performs category-wise attribute decomposition, attribute-specific rewriting, and multi-model voting to ensure high-quality, scalable annotations. We evaluate state-of-the-art vision-language models on fine-grained attributes (appearance, clothing, pose, parts) and high-level semantics (social relations, action semantics, spatial relations, intent and functionality). Our findings show that: 1) format-aligned SFT data substantially improves instruction following and stability; 2) challenge-focused RL data derived from bad-case analysis further enhances perception and reasoning on difficult instances; and 3) training Qwen2.5-VL-7B with MHPR yields significant gains, achieving near-parity with considerably larger models. We release ACVG and MHPR to facilitate reproducible, extensible research on human-centric perception and reasoning.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Our findings show that: 1) format-aligned SFT data substantially improves instruction following and stability; 2) challenge-focused RL data derived from bad-case analysis further enhances…

WHY NOW

Vision-Language Models moved forward this cycle; last verified May 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "da55f08f-b7dd-4f51-bc7c-bb6a2af8d9aa", "arxiv_id": "2605.03485", "canonical_route": "/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "endpoints": { "paper_pack": "/api/v1/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models/paper-pack", "build_passport": "/api/v1/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models", "normalized_query": "2605.03485", "route": "/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "paper_ref": "mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models#webpage", "url": "https://sciencetostartup.com/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "name": "MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models", "description": "A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models#scholarlyArticle", "headline": "MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models", "description": "A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.", "url": "https://sciencetostartup.com/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models", "sameAs": "https://arxiv.org/abs/2605.03485", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.03485" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-05T08:20:48.000Z", "author": [ { "@type": "Person", "name": "Kangkang Wang" }, { "@type": "Person", "name": "Qinting Jiang" }, { "@type": "Person", "name": "Wanping Zhang" }, { "@type": "Person", "name": "Bowen Ren" }, { "@type": "Person", "name": "Shengzhao Wen" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MHPR: Multidimensional Human Perception and Reasoning Benchm", "item": "https://sciencetostartup.com/paper/mhpr-multidimensional-human-perception-and-reasoning-benchmark-for-large-vision-languate-models" } ] } ] }

Competitive landscape

A benchmark and data generation pipeline for evaluating multidimensional human perception and reasoning in large vision-language models.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline