ARXIV:2603.26647 · REINFORCEMENT LEARNING · SUBMITTED 30 MAR · 22:29 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

Ashutosh Soni · Peizhong Ju · Atilla Eryilmaz · Ness B. Shroff · arXiv

A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

Evidence 22 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability. We use a bipartite graph to link actions to a set of unknowns, such that selecting an…

METHOD

Full abstract

We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. Code availability is flagged in the…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

Evidence22 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "3d478b2d-5ded-4d86-b212-8b0e70c8c470", "arxiv_id": "2603.26647", "canonical_route": "/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "endpoints": { "paper_pack": "/api/v1/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability/paper-pack", "build_passport": "/api/v1/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability", "normalized_query": "2603.26647", "route": "/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "paper_ref": "an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability#webpage", "url": "https://sciencetostartup.com/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "name": "An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability", "description": "A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability#scholarlyArticle", "headline": "An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability", "description": "A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.", "url": "https://sciencetostartup.com/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability", "sameAs": "https://arxiv.org/abs/2603.26647", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26647" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T17:50:42.000Z", "author": [ { "@type": "Person", "name": "Ashutosh Soni" }, { "@type": "Person", "name": "Peizhong Ju" }, { "@type": "Person", "name": "Atilla Eryilmaz" }, { "@type": "Person", "name": "Ness B. Shroff" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reinforcement Learning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reinforcement Learning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "An LP-based Sampling Policy for Multi-Armed Bandits with Sid", "item": "https://sciencetostartup.com/paper/an-lp-based-sampling-policy-for-multi-armed-bandits-with-side-observations-and-stochastic-availability" } ] } ] }

Competitive landscape

A novel policy for multi-armed bandits that optimizes exploration in dynamic environments with side-observations and stochastic availability.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline