ARXIV:2604.02162 · AI EVALUATION & BENCHMARKING · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Saurabh Hinduja · Gurmeet Kaur · Maneesh Bilalpur · Jeffrey Cohn · Shaun Canavan · arXiv

A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method. We show that cross-validation itself introduces measurable…

METHOD

Full abstract

Subject-exclusive cross-validation is the standard evaluation protocol for facial Action Unit (AU) detection, yet reported improvements are often small. We show that cross-validation itself introduces measurable stochastic variance. On BP4D+, repeated 3-fold subject-exclusive splits produce an empirical noise floor of $\pm 0.065$ in average F1, with substantially larger variation for low-prevalence AUs. Operating-point metrics such as F1 fluctuate more than threshold-independent measures such as AUC, and model ranking can change under different fold assignments. We further evaluate cross-dataset robustness using a Leave-One-Dataset-Out (LODO) protocol across five AU datasets. LODO removes partition randomness and exposes domain-level instability that is not visible under single-dataset cross-validation. Together, these results suggest that gains often reported in cross-fold validation may fall within protocol variance. Leave-one-dataset-out cross-validation yields more stable and interpretable findings

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. We show that cross-validation itself introduces measurable stochastic variance. Code availability is flagged in the production record; the public repository link still needs proof…

WHY NOW

AI Evaluation & Benchmarking moved forward this cycle; last verified April 2026. Public score 3.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

Segment

AI Evaluation & Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "90808948-1860-4745-8b1a-504e51070b7d", "arxiv_id": "2604.02162", "canonical_route": "/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "endpoints": { "paper_pack": "/api/v1/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation/paper-pack", "build_passport": "/api/v1/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation", "normalized_query": "2604.02162", "route": "/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "paper_ref": "beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation#webpage", "url": "https://sciencetostartup.com/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "name": "Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation", "description": "A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation#scholarlyArticle", "headline": "Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation", "description": "A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.", "url": "https://sciencetostartup.com/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation", "sameAs": "https://arxiv.org/abs/2604.02162", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.02162" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T15:28:55.000Z", "author": [ { "@type": "Person", "name": "Saurabh Hinduja" }, { "@type": "Person", "name": "Gurmeet Kaur" }, { "@type": "Person", "name": "Maneesh Bilalpur" }, { "@type": "Person", "name": "Jeffrey Cohn" }, { "@type": "Person", "name": "Shaun Canavan" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Evaluation & Benchmarking" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Evaluation & Benchmarking", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Beyond the Fold: Quantifying Split-Level Noise and the Case ", "item": "https://sciencetostartup.com/paper/beyond-the-fold-quantifying-split-level-noise-and-the-case-for-leave-one-dataset-out-au-evaluation" } ] } ] }

Competitive landscape

A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.

Segment

AI Evaluation & Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline