ARXIV:2603.09436 · OFF-POLICY EVALUATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation

arXiv

A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

Blocked on Code›Score2.0Evidence unverified

Opportunity summary

Pain A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias. This historical data typically does not faithfully represent action distribution of the new policy accurately.

METHOD

Full abstract

We study off-policy evaluation in the setting of contextual bandits, where we aim to evaluate a new policy using historical data that consists of contexts, actions and received rewards. This historical data typically does not faithfully represent action distribution of the new policy accurately. A common approach, inverse probability weighting (IPW), adjusts for these discrepancies in action distributions. However, this method often suffers from high variance due to the probability being in the denominator. The doubly robust (DR) estimator reduces variance through modeling reward but does not directly address variance from IPW. In this work, we address the limitation of IPW by proposing a Nonparametric Weighting (NW) approach that constructs weights using a nonparametric model. Our NW approach achieves low bias like IPW but typically exhibits significantly lower variance. To further reduce variance, we incorporate reward predictions -- similar to the DR technique -- resulting in the Model-assisted Nonparametric Weighting (MNW) approach. The MNW approach yields accurate value estimates by explicitly modeling and mitigating bias from reward modeling, without aiming to guarantee the standard doubly robust property. Extensive empirical comparisons show that our approaches consistently outperform existing techniques, achieving lower variance in value estimation while maintaining low bias.

RESULT

ScienceToStartup currently rates this 2.0/10 on the public viability pass. Our NW approach achieves low bias like IPW but typically exhibits significantly lower variance.

WHY NOW

Off-Policy Evaluation moved forward this cycle; last verified April 2026. Public score 2.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score2.0

PainA novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

Segment

Off-Policy Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

2.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "8130ab32-1fe8-4d0e-a78d-d8a7cda9a293", "arxiv_id": "2603.09436", "canonical_route": "/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "endpoints": { "paper_pack": "/api/v1/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation/paper-pack", "build_passport": "/api/v1/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation", "normalized_query": "2603.09436", "route": "/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "paper_ref": "from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation#webpage", "url": "https://sciencetostartup.com/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "name": "From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation", "description": "A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation#scholarlyArticle", "headline": "From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation", "description": "A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.", "url": "https://sciencetostartup.com/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation", "sameAs": "https://arxiv.org/abs/2603.09436", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.09436" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-10T09:48:22.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 2 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Off-Policy Evaluation" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Off-Policy Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "From Weighting to Modeling: A Nonparametric Estimator for Of", "item": "https://sciencetostartup.com/paper/from-weighting-to-modeling-a-nonparametric-estimator-for-off-policy-evaluation" } ] } ] }

Competitive landscape

A novel nonparametric estimator for off-policy evaluation in contextual bandits that reduces variance while maintaining low bias.

Segment

Off-Policy Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

2.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation

From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline