ARXIV:2602.11087 · REINFORCEMENT LEARNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

arXiv

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with…

METHOD

Full abstract

Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with diverse expertise levels. Limited exploration can impair the offline RL algorithm's ability to estimate \textit{Q} or \textit{V} values, while constraining towards diverse behavior policies can be overly conservative. Such datasets call for a balance between the RL objective and behavior policy constraints. We first identify the connection between $f$-divergence and optimization constraint on the Bellman residual through a more general Linear Programming form for RL and the convex conjugate. Following this, we introduce the general flexible function formulation for the $f$-divergence to incorporate an adaptive constraint on algorithms' learning objectives based on the offline training dataset. Results from experiments on the MuJoCo, Fetch, and AdroitHand environments show the correctness of the proposed LP form and the potential of the flexible $f$-divergence in improving performance for learning from a challenging dataset when applied to a compatible constrained optimization algorithm.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the…

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainImprove offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

ARXIV:2602.11087 · REINFORCEMENT LEARNING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

arXiv

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

RESULT

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainImprove offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Paper Pack

10.48550/arXiv.2602.11087

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Derived fallback

Read summaries are estimated from adjacent metadata, not verified extraction rows.

Proof status

unverified

0 refs; 0 sources; 17% coverage.

What was readable

linkedon filenot materializedderived fallback28 indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

5.0

Time to MVP

MVP estimate missing

Commercial

No commercial flags on file

Export

Preparing verified analysis

lens / founder

PROBLEM

METHOD

RESULT

WHY NOW

Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 5.0/10.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with diverse expertise levels.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset. However, practical offline datasets often contain examples with little diversity or limited exploration of the environment, and from multiple behavior policies with diverse expertise levels.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 5.0/10 on the public viability pass. Offline RL algorithms aim to improve upon the behavior policy that produces the collected data while constraining the learned policy to be within the support of the dataset.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Reinforcement Learning moved forward this cycle; last verified April 2026. Public score 5.0/10.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

Constellation map

Paper-native neighborhood for concepts, methods, materials, markets, and competitors. Missing lanes stay labeled instead of disappearing behind commercialization gates.

Open full Signal Canvas

Concepts

not indexed

Methods

Materials

PDF linked

Markets

Reinforcement Learning

Competitors

not indexed

Competitive landscape

Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.

Segment

Reinforcement Learning

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Buzz

No indexed public discussion is attached to 2602.11087 yet. That is a visibility signal, not a blank module: the monitor is watching the public channels below.

Hacker News

Not indexed yet

Bluesky

Not indexed yet

PDF

Preview the source document here, or use the hero PDF action for a new tab.

References(28)

Neural quasiprobabilistic likelihood ratio estimation with negatively weighted data

2024M. Drnevich, S. Jiggins et al.

Relaxed Stationary Distribution Correction Estimation for Improved Offline Policy Optimization

2024Woo-Seong Kim, Donghyeon Ki et al.

BridgeData V2: A Dataset for Robot Learning at Scale

2023H. Walke, Kevin Black et al.

Chat GPT & Google Bard AI: A Review

2023Shashi Kant Singh, Shubham Kumar et al.

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

2023Haoran Xu, Li Jiang et al.

Offline Imitation Learning with Suboptimal Demonstrations via Relaxed Distribution Matching

2023Lantao Yu, Tianhe Yu et al.

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

2023Harshit S. Sikchi, Qinqing Zheng et al.

Extreme Q-Learning: MaxEnt RL without Entropy

2023Divyansh Garg, Joey Hejna et al.

A Dataset Perspective on Offline Reinforcement Learning

2021Kajetan Schweighofer, Andreas Radler et al.

Offline Reinforcement Learning with Implicit Q-Learning

2021Ilya Kostrikov, Ashvin Nair et al.

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

2021Jongmin Lee, Wonseok Jeon et al.

A Minimalist Approach to Offline Reinforcement Learning

2021Scott Fujimoto, S. Gu

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

2021Paria Rashidinejad, Banghua Zhu et al.

How to train your robot with deep reinforcement learning: lessons we have learned

2021Julian Ibarz, Jie Tan et al.

Conservative Q-Learning for Offline Reinforcement Learning

2020Aviral Kumar, Aurick Zhou et al.

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

2020S. Levine, Aviral Kumar et al.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

2020Justin Fu, Aviral Kumar et al.

GenDICE: Generalized Offline Estimation of Stationary Values

2020Ruiyi Zhang, Bo Dai et al.

AlgaeDICE: Policy Gradient from Arbitrary Experience

2019Ofir Nachum, Bo Dai et al.

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

2019Ofir Nachum, Yinlam Chow et al.

Showing 20 of 28 references

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkFORLER: Federated Offline Reinforcement Learning with Q-Ensemble and Actor Rectification

5.0

Prior WorkSelecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

5.0

Extension

Builds On This$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses

0.0

Builds On ThisOffline Constrained RLHF with Multiple Preference Oracles

3.0

Builds On ThisResiduals-based Offline Reinforcement Learning

4.0

Commercially relevant

Higher ViabilityDiscrete Flow Matching for Offline-to-Online Reinforcement Learning

6.0

Higher ViabilityRobust Regularized Policy Iteration under Transition Uncertainty

7.0

Conflicting

Competing ApproachUsing Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

4.0

Competing ApproachReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation

5.0

Competing ApproachLearning Optimal and Sample-Efficient Decision Policies with Guarantees

5.0

Related Resources

Just-In-Time Reinforcement Learning(glossary)
Multi-Agent Reinforcement Learning(glossary)
Multi-Agent Test-Time Reinforcement Learning (MATTRL)(glossary)
How does PRISM improve reinforcement learning?(question)
What is the significance of reinforcement learning in AI?(question)
How does RetroAgent improve reinforcement learning?(question)
Reinforcement Learning – Use Cases(use_case)

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Agent drawer

5 surfaces preserved for agents. Humans can ignore.

Developer contracts, payload previews, evidence maps, and run controls stay here instead of the Read, Build, and Track workspace.

Run context

Paper: 2602.11087
Route: /paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies
Active tab: read
Artifact: general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

Available agents

Read extractor
Build planner
Track monitor
Competitive mapper
Related-paper scout

API/MCP endpoints

REST paper pack API/api/v1/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies/paper-pack
REST build passport API/api/v1/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies/build-passport
REST OpenAPI/api/openapi.json
MCP descriptor/api/mcp
MCP resourcesciencetostartup://surfaces/paper-workspace

Tool contracts

paper_packbuild_passportopportunity_kernelforesightsource_proofevidence_state

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "f366a477-dcba-4ae5-8e23-eadca0603536",
  "arxiv_id": "2602.11087",
  "canonical_route": "/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
  "endpoints": {
    "paper_pack": "/api/v1/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies/paper-pack",
    "build_passport": "/api/v1/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Schema validation

paper-r2 contract: present
JSON-LD twin: SSR emitted
OpenAPI path parity: /api/openapi.json
MCP resource parity: paper-workspace

Job trace

queued: drawer opened by user action
running: inspect or copy payload
succeeded: payload available in SSR
failed: route errors appear in evidence cards

Evidence map

sources used: page freshness, source proof anchors, JSON-LD
missing sources: exposed by PaperPack and EvidenceState chips
derived fallbacks: marked unverified before handoff

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

stale

Proof freshness: stale
Proof status: unverified
Display score: 5/10
Last proof check: 2026-04-02
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 17%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

OpenAlex: pending — this preprint is not yet indexed by OpenAlex.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

Canonical ID general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies | Route /paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2602.11087"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies",
  "normalized_query": "2602.11087",
  "route": "/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
  "paper_ref": "general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Watch and verify: General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

/buildability/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

Watchwatch

Subject: General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.

Evidence ids

Receipt path

/buildability/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

Paper ref

general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies

arXiv id

2602.11087

Freshness

Generated at

2026-04-02T02:30:40.136Z

Evidence freshness

stale

Last verification

2026-04-02T02:30:40.136Z

Sources

References

Coverage

17%

Hash state

Lineage hash

9721c1343768ffcfbb9990576070a77e4722d61ccf7e876115b826f244ee78c4

Canonical opportunity-kernel lineage hash.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Missing: repo_url
Missing: references
Missing: proof_status
Missing: distribution_readiness_scores
Missing: paper_extraction_scorecards
Unknown: distribution readiness has not been computed yet
Unknown: proof verification has not been recorded yet

Verification pending / evidence receipt incomplete

repo_url

references

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Open receipt API receipt Build Loop Signal Canvas Proof divergence Divergence API Brier outcomes API

Source Proof anchors

Visual citations from the paper document graph.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies#webpage",
      "url": "https://sciencetostartup.com/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
      "name": "General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies",
      "description": "Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies#scholarlyArticle",
      "headline": "General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies",
      "description": "Improve offline RL performance using flexible $f$-divergence constraints for better dataset adaptation.",
      "url": "https://sciencetostartup.com/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies",
      "sameAs": "https://arxiv.org/abs/2602.11087",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2602.11087"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-02-11T17:53:49.000Z",
      "citation": [
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "8b158378098ae7996f9292dfe46654e8ef8d844a"
          },
          "url": "https://www.semanticscholar.org/paper/8b158378098ae7996f9292dfe46654e8ef8d844a"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "ca5d80551936ea2aca2ee7797813e06902436a07"
          },
          "url": "https://www.semanticscholar.org/paper/ca5d80551936ea2aca2ee7797813e06902436a07"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "6a375be82efc01ec4ed73334655935a56ba82d38"
          },
          "url": "https://www.semanticscholar.org/paper/6a375be82efc01ec4ed73334655935a56ba82d38"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "d60d023ece16906fee709beb9cf7fb60e3a19f3d"
          },
          "url": "https://www.semanticscholar.org/paper/d60d023ece16906fee709beb9cf7fb60e3a19f3d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "18d82f2a4aa1e2c1c4b447876c95b8f7e717e1a1"
          },
          "url": "https://www.semanticscholar.org/paper/18d82f2a4aa1e2c1c4b447876c95b8f7e717e1a1"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "d05a0088e6ab6ba367650527c7e1cc46524da3dc"
          },
          "url": "https://www.semanticscholar.org/paper/d05a0088e6ab6ba367650527c7e1cc46524da3dc"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "672ec9fa4ddb5b6bfc46c61c5b2f4bdfa1aa8ed9"
          },
          "url": "https://www.semanticscholar.org/paper/672ec9fa4ddb5b6bfc46c61c5b2f4bdfa1aa8ed9"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "2c2180fbe7f38e88b1123e5fab43785b66814e5d"
          },
          "url": "https://www.semanticscholar.org/paper/2c2180fbe7f38e88b1123e5fab43785b66814e5d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "6f7301ec378541eacb3968176e26eb16ae3dc941"
          },
          "url": "https://www.semanticscholar.org/paper/6f7301ec378541eacb3968176e26eb16ae3dc941"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "348a855fe01f3f4273bf0ecf851ca688686dbfcc"
          },
          "url": "https://www.semanticscholar.org/paper/348a855fe01f3f4273bf0ecf851ca688686dbfcc"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "eee1a516cb9a5cc3ac41dd3edd9b7d3727f7dd73"
          },
          "url": "https://www.semanticscholar.org/paper/eee1a516cb9a5cc3ac41dd3edd9b7d3727f7dd73"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "c879b25308026d6538e52b27bcf4fd3cb60855f3"
          },
          "url": "https://www.semanticscholar.org/paper/c879b25308026d6538e52b27bcf4fd3cb60855f3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "0bcc734246586966596b1aa22efac2565224ebee"
          },
          "url": "https://www.semanticscholar.org/paper/0bcc734246586966596b1aa22efac2565224ebee"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "400811ee31020a3f002551476dac25973e13035e"
          },
          "url": "https://www.semanticscholar.org/paper/400811ee31020a3f002551476dac25973e13035e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "28db20a81eec74a50204686c3cf796c42a020d2e"
          },
          "url": "https://www.semanticscholar.org/paper/28db20a81eec74a50204686c3cf796c42a020d2e"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5e7bc93622416f14e6948a500278bfbe58cd3890"
          },
          "url": "https://www.semanticscholar.org/paper/5e7bc93622416f14e6948a500278bfbe58cd3890"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "a326d9f2d2d351001fece788165dbcbb524da2e4"
          },
          "url": "https://www.semanticscholar.org/paper/a326d9f2d2d351001fece788165dbcbb524da2e4"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "2b0e79ed1340a79344e37b6f57191b76d810962f"
          },
          "url": "https://www.semanticscholar.org/paper/2b0e79ed1340a79344e37b6f57191b76d810962f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "27d78a26ddb9698b9cefcf6cdeafa4f834466103"
          },
          "url": "https://www.semanticscholar.org/paper/27d78a26ddb9698b9cefcf6cdeafa4f834466103"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "875280d96b2f138902061ae6409249ee4ded0da3"
          },
          "url": "https://www.semanticscholar.org/paper/875280d96b2f138902061ae6409249ee4ded0da3"
        }
      ],
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 5
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "Reinforcement Learning"
        }
      ]
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Reinforcement Learning",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "General Flexible $f$-divergence for Challenging Offline RL D",
          "item": "https://sciencetostartup.com/paper/general-flexible-f-divergence-for-challenging-offline-rl-datasets-with-low-stochasticity-and-diverse-behavior-policies"
        }
      ]
    }
  ]
}

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(28)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(28)

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline