ARXIV:2603.05900 · DRUG DISCOVERY · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

arXiv

Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization. However, these recipes perform poorly in instruction-based molecular optimization, where each data point typically provides only…

METHOD

Full abstract

Large language models (LLMs) benefit substantially from supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR) in reasoning tasks. However, these recipes perform poorly in instruction-based molecular optimization, where each data point typically provides only a single optimized reference molecule and no step-by-step optimization trajectory. We reveal that answer-only SFT on the reference molecules collapses reasoning, and RLVR provides sparse feedback under similarity constraints due to the model's lack of effective exploration, which slows learning and limits optimization. To encourage the exploration of new molecules while balancing the exploitation of the reference molecules, we introduce Reference-guided Policy Optimization (RePO), an optimization approach that learns from reference molecules without requiring trajectory data. At each update, RePO samples candidate molecules with their intermediate reasoning trajectories from the model and trains the model using verifiable rewards that measure property satisfaction under similarity constraints in an RL manner. Meanwhile, it applies reference guidance by keeping the policy's intermediate reasoning trajectory as context and training only the answer in a supervised manner. Together, the RL term promotes exploration, while the guidance term mitigates reward sparsity and stabilizes training by grounding outputs to references when many valid molecular edits exist. Across molecular optimization benchmarks, RePO consistently outperforms SFT and RLVR baselines (e.g., GRPO), achieving improvements on the optimization metric (Success Rate $\times$ Similarity), improving balance across competing objectives, and generalizing better to unseen instruction styles. Our code is publicly available at https://github.com/tmlr-group/RePO.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our code is publicly available at https://github.com/tmlr-group/RePO.

WHY NOW

Drug Discovery moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainOptimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

Segment

Drug Discovery

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "99aa1323-c2e2-4992-8d43-063f8ebf5350", "arxiv_id": "2603.05900", "canonical_route": "/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "endpoints": { "paper_pack": "/api/v1/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning/paper-pack", "build_passport": "/api/v1/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning", "normalized_query": "2603.05900", "route": "/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "paper_ref": "reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning#webpage", "url": "https://sciencetostartup.com/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "name": "Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning", "description": "Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning#scholarlyArticle", "headline": "Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning", "description": "Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.", "url": "https://sciencetostartup.com/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning", "sameAs": "https://arxiv.org/abs/2603.05900", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.05900" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-06T04:39:08.000Z", "author": [ { "@type": "Person", "name": "Bo Han", "affiliation": { "@type": "Organization", "name": "Hong Kong Baptist University" } }, { "@type": "Person", "name": "Xuan Li", "affiliation": { "@type": "Organization", "name": "Hong Kong Baptist University" } }, { "@type": "Person", "name": "Zhanke Zhou", "affiliation": { "@type": "Organization", "name": "Hong Kong Baptist University" } }, { "@type": "Person", "name": "Zongze Li", "affiliation": { "@type": "Organization", "name": "Hong Kong Baptist University" } }, { "@type": "Person", "name": "Jiangchao Yao", "affiliation": { "@type": "Organization", "name": "CMIC, Shanghai Jiao Tong University" } }, { "@type": "Person", "name": "Yu Rong", "affiliation": { "@type": "Organization", "name": "DAMO Academy, Alibaba Group" } }, { "@type": "Person", "name": "Lu Zhang", "affiliation": { "@type": "Organization", "name": "Hong Kong Baptist University" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Drug Discovery" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Drug Discovery", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Reference-guided Policy Optimization for Molecular Optimizat", "item": "https://sciencetostartup.com/paper/reference-guided-policy-optimization-for-molecular-optimization-via-llm-reasoning" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"Reference-guided Policy Optimization for Molecular Optimizat\"?", "acceptedAnswer": { "@type": "Answer", "text": "Optimize molecular properties using LLM-driven policy optimization for drug design." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Develop a SaaS platform that leverages the AI model to offer subscription-based molecular optimization services for drug discovery and materials science firms." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "An AI tool for pharmaceutical companies to perform efficient and precise molecular optimizations, allowing for faster drug candidate evaluations with more promising properties." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "Could disrupt traditional, less efficient molecular modeling approaches that rely heavily on manual adjustments and chemical expertise." } } ] } ] }

Competitive landscape

Optimize molecule design using LLMs with reference-guided policy optimization, balancing exploration and exploitation for improved performance and generalization.

Segment

Drug Discovery

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline