ARXIV:2602.02262 · SOFTWARE ENGINEERING AGENTS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

OmniCode: A Benchmark for Evaluating Software Engineering Agents

arXiv

OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages. To drive the research towards better coding agents, we require challenging benchmarks that can rigorously evaluate the ability…

METHOD

Full abstract

LLM-powered coding agents are redefining how real-world software is developed. To drive the research towards better coding agents, we require challenging benchmarks that can rigorously evaluate the ability of such agents to perform various software engineering tasks. However, popular coding benchmarks such as HumanEval and SWE-Bench focus on narrowly scoped tasks such as competition programming and patch generation. In reality, software engineers have to handle a broader set of tasks for real-world software development. To address this gap, we propose OmniCode, a novel software engineering benchmark that contains a broader and more diverse set of task categories beyond code or patch generation. Overall, OmniCode contains 1794 tasks spanning three programming languages (Python, Java, and C++) and four key categories: bug fixing, test generation, code review fixing, and style fixing. In contrast to prior software engineering benchmarks, the tasks in OmniCode are (1) manually validated to eliminate ill-defined problems, and (2) synthetically crafted or recently curated to avoid data leakage issues, presenting a new framework for synthetically generating diverse software tasks from limited real-world data. We evaluate OmniCode with popular agent frameworks such as SWE-Agent and show that while they may perform well on bug fixing for Python, they fall short on tasks such as Test Generation and in languages such as C++ and Java. For instance, SWE-Agent achieves a maximum of 20.9% with DeepSeek-V3.1 on Java Test Generation tasks. OmniCode aims to serve as a robust benchmark and spur the development of agents that can perform well across different aspects of software development. Code and data are available at https://github.com/seal-research/OmniCode.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. We evaluate OmniCode with popular agent frameworks such as SWE-Agent and show that while they may perform well on bug fixing for Python, they…

WHY NOW

Software Engineering Agents moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainOmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

Segment

Software Engineering Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d66443a8-1f3d-4448-8a4d-df73a553639c", "arxiv_id": "2602.02262", "canonical_route": "/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "omnicode-a-benchmark-for-evaluating-software-engineering-agents", "endpoints": { "paper_pack": "/api/v1/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents/paper-pack", "build_passport": "/api/v1/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "OmniCode: A Benchmark for Evaluating Software Engineering Agents", "normalized_query": "2602.02262", "route": "/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents", "paper_ref": "omnicode-a-benchmark-for-evaluating-software-engineering-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents#webpage", "url": "https://sciencetostartup.com/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents", "name": "OmniCode: A Benchmark for Evaluating Software Engineering Agents", "description": "OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents#scholarlyArticle", "headline": "OmniCode: A Benchmark for Evaluating Software Engineering Agents", "description": "OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.", "url": "https://sciencetostartup.com/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents", "sameAs": "https://arxiv.org/abs/2602.02262", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.02262" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-02T16:04:10.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Software Engineering Agents" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Software Engineering Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "OmniCode: A Benchmark for Evaluating Software Engineering Ag", "item": "https://sciencetostartup.com/paper/omnicode-a-benchmark-for-evaluating-software-engineering-agents" } ] } ] }

Competitive landscape

OmniCode offers a comprehensive benchmark for evaluating software engineering agents across diverse tasks and programming languages.

Segment

Software Engineering Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

OmniCode: A Benchmark for Evaluating Software Engineering Agents

OmniCode: A Benchmark for Evaluating Software Engineering Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline