ARXIV:2603.09400 · AGENTS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Reward Prediction with Factorized World States

arXiv

StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

Blocked on Code›Score8.0Evidence unverified

Opportunity summary

Pain StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains. Supervised learning of reward models could introduce biases inherent to training data, limiting generalization to novel goals and environments.

METHOD

Full abstract

Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting generalization to novel goals and environments. In this paper, we investigate whether well-defined world state representations alone can enable accurate reward prediction across domains. To address this, we introduce StateFactory, a factorized representation method that transforms unstructured observations into a hierarchical object-attribute structure using language models. This structured representation allows rewards to be estimated naturally as the semantic similarity between the current state and the goal state under hierarchical constraint. Overall, the compact representation structure induced by StateFactory enables strong reward generalization capabilities. We evaluate on RewardPrediction, a new benchmark dataset spanning five diverse domains and comprising 2,454 unique action-observation trajectories with step-wise ground-truth rewards. Our method shows promising zero-shot results against both VLWM-critic and LLM-as-a-Judge reward models, achieving 60% and 8% lower EPIC distance, respectively. Furthermore, this superior reward quality successfully translates into improved agent planning performance, yielding success rate gains of +21.64% on AlfWorld and +12.40% on ScienceWorld over reactive system-1 policies and enhancing system-2 agent planning. Project Page: https://statefactory.github.io

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. In this paper, we investigate whether well-defined world state representations alone can enable accurate reward prediction across domains.

WHY NOW

Agents moved forward this cycle; last verified April 2026. Public score 8.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainStateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "b96be7f1-9dc2-4e32-8fbb-612e05f12f6a", "arxiv_id": "2603.09400", "canonical_route": "/paper/reward-prediction-with-factorized-world-states", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reward-prediction-with-factorized-world-states", "endpoints": { "paper_pack": "/api/v1/paper/reward-prediction-with-factorized-world-states/paper-pack", "build_passport": "/api/v1/paper/reward-prediction-with-factorized-world-states/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Reward Prediction with Factorized World States", "normalized_query": "2603.09400", "route": "/paper/reward-prediction-with-factorized-world-states", "paper_ref": "reward-prediction-with-factorized-world-states", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states#webpage", "url": "https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states", "name": "Reward Prediction with Factorized World States", "description": "StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states#scholarlyArticle", "headline": "Reward Prediction with Factorized World States", "description": "StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.", "url": "https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states", "sameAs": "https://arxiv.org/abs/2603.09400", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.09400" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-10T09:12:20.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agents" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Reward Prediction with Factorized World States", "item": "https://sciencetostartup.com/paper/reward-prediction-with-factorized-world-states" } ] } ] }

Competitive landscape

StateFactory transforms unstructured observations into structured representations for accurate reward prediction across diverse domains.

Segment

Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Reward Prediction with Factorized World States

Reward Prediction with Factorized World States

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline