ARXIV:2602.05780 · AI-ENHANCED SOFTWARE DEVELOPMENT · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

arXiv

Auto-customized LLMs for efficient and precise code completion in proprietary repositories.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain Auto-customized LLMs for efficient and precise code completion in proprietary repositories.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Auto-customized LLMs for efficient and precise code completion in proprietary repositories. Despite the increased performance of LLMs on public benchmarks, out of the box LLMs still have a hard time generating code that aligns…

METHOD

Full abstract

Code completion (CC) is a task frequently used by developers when working in collaboration with LLM-based programming assistants. Despite the increased performance of LLMs on public benchmarks, out of the box LLMs still have a hard time generating code that aligns with a private code repository not previously seen by the model's training data. Customizing code LLMs to a private repository provides a way to improve the model performance. In this paper we present our approach for automated LLM customization based on semantic scopes in the code. We evaluate LLMs on real industry cases with two private enterprise code repositories with two customization strategies: Retrieval-Augmented Generation (RAG) and supervised Fine-Tuning (FT). Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns specific to the repository, providing more precise code to developers and helping to boost their productivity. The code completions of moderately sized customized models can be significantly better than those of uncustomized models of much larger capacity. We also include an analysis of customization on two public benchmarks and present opportunities for future work.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Customizing code LLMs to a private repository provides a way to improve the model performance.

WHY NOW

AI-enhanced Software Development moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAuto-customized LLMs for efficient and precise code completion in proprietary repositories.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Auto-customized LLMs for efficient and precise code completion in proprietary repositories.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Auto-customized LLMs for efficient and precise code completion in proprietary repositories.

Segment

AI-enhanced Software Development

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(14)

Reference metadata pending (57fc6d44578d072f997ceff69d7b0d003b91fded)

Reference metadata pending (fb095c66d5bdaf09adcc67ed08b3aff9b0859be8)

Reference metadata pending (9bd7c20550432a34c5acf1265ea1aa47901a7f99)

Reference metadata pending (b59182a6bd936424b02ca8dd548f058dac9af9b2)

Reference metadata pending (453ecd6c333cbcafbd900633092c73a63feab8eb)

Reference metadata pending (ed1127723fb3f8a0670f9a8d07acff32c1996dc8)

Reference metadata pending (c8b18682965ff9dccc0130dab3d679f78cefa617)

Reference metadata pending (e03f41877da45c04a38aa37af99cfdec9a0379dd)

Reference metadata pending (e81c707040ce604c7102cfe14d78b72385c17b68)

Reference metadata pending (242188b68aaa5b3cb8db99bb543a70971d49d5ba)

Reference metadata pending (88c94b11bd18161d027a28dd758b58698063e029)

Reference metadata pending (70d6dfdc40c4681ba5d51d60116db0311b5126ce)

Reference metadata pending (f561f347779741550dcca0b3f0d2bc22ddcd88af)

Reference metadata pending (b2f8876482c97e804bb50a5e2433881ae31d0cdd)

{ "contract_version": "paper-r2", "paper_id": "d6ffca47-1813-4246-82eb-291075c691c2", "arxiv_id": "2602.05780", "canonical_route": "/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "endpoints": { "paper_pack": "/api/v1/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes/paper-pack", "build_passport": "/api/v1/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes", "normalized_query": "2602.05780", "route": "/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "paper_ref": "automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes#webpage", "url": "https://sciencetostartup.com/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "name": "Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes", "description": "Auto-customized LLMs for efficient and precise code completion in proprietary repositories.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes#scholarlyArticle", "headline": "Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes", "description": "Auto-customized LLMs for efficient and precise code completion in proprietary repositories.", "url": "https://sciencetostartup.com/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes", "sameAs": "https://arxiv.org/abs/2602.05780", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.05780" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-05T15:38:54.000Z", "author": [ { "@type": "Person", "name": "Ulrich Finkler", "affiliation": { "@type": "Organization", "name": "IBM Research" } }, { "@type": "Person", "name": "Irene Manotas", "affiliation": { "@type": "Organization", "name": "IBM Research" } }, { "@type": "Person", "name": "Wei Zhang", "affiliation": { "@type": "Organization", "name": "IBM Research" } }, { "@type": "Person", "name": "Geert Janssen", "affiliation": { "@type": "Organization", "name": "IBM Research" } }, { "@type": "Person", "name": "Octavian Popescu", "affiliation": { "@type": "Organization", "name": "IBM Research" } }, { "@type": "Person", "name": "Shyam Ramji", "affiliation": { "@type": "Organization", "name": "IBM Research" } } ], "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "57fc6d44578d072f997ceff69d7b0d003b91fded" }, "url": "https://www.semanticscholar.org/paper/57fc6d44578d072f997ceff69d7b0d003b91fded" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "fb095c66d5bdaf09adcc67ed08b3aff9b0859be8" }, "url": "https://www.semanticscholar.org/paper/fb095c66d5bdaf09adcc67ed08b3aff9b0859be8" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "9bd7c20550432a34c5acf1265ea1aa47901a7f99" }, "url": "https://www.semanticscholar.org/paper/9bd7c20550432a34c5acf1265ea1aa47901a7f99" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b59182a6bd936424b02ca8dd548f058dac9af9b2" }, "url": "https://www.semanticscholar.org/paper/b59182a6bd936424b02ca8dd548f058dac9af9b2" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "453ecd6c333cbcafbd900633092c73a63feab8eb" }, "url": "https://www.semanticscholar.org/paper/453ecd6c333cbcafbd900633092c73a63feab8eb" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ed1127723fb3f8a0670f9a8d07acff32c1996dc8" }, "url": "https://www.semanticscholar.org/paper/ed1127723fb3f8a0670f9a8d07acff32c1996dc8" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "c8b18682965ff9dccc0130dab3d679f78cefa617" }, "url": "https://www.semanticscholar.org/paper/c8b18682965ff9dccc0130dab3d679f78cefa617" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "e03f41877da45c04a38aa37af99cfdec9a0379dd" }, "url": "https://www.semanticscholar.org/paper/e03f41877da45c04a38aa37af99cfdec9a0379dd" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "e81c707040ce604c7102cfe14d78b72385c17b68" }, "url": "https://www.semanticscholar.org/paper/e81c707040ce604c7102cfe14d78b72385c17b68" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "242188b68aaa5b3cb8db99bb543a70971d49d5ba" }, "url": "https://www.semanticscholar.org/paper/242188b68aaa5b3cb8db99bb543a70971d49d5ba" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "88c94b11bd18161d027a28dd758b58698063e029" }, "url": "https://www.semanticscholar.org/paper/88c94b11bd18161d027a28dd758b58698063e029" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "70d6dfdc40c4681ba5d51d60116db0311b5126ce" }, "url": "https://www.semanticscholar.org/paper/70d6dfdc40c4681ba5d51d60116db0311b5126ce" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "f561f347779741550dcca0b3f0d2bc22ddcd88af" }, "url": "https://www.semanticscholar.org/paper/f561f347779741550dcca0b3f0d2bc22ddcd88af" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "b2f8876482c97e804bb50a5e2433881ae31d0cdd" }, "url": "https://www.semanticscholar.org/paper/b2f8876482c97e804bb50a5e2433881ae31d0cdd" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI-enhanced Software Development" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI-enhanced Software Development", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Automated Customization of LLMs for Enterprise Code Reposito", "item": "https://sciencetostartup.com/paper/automated-customization-of-llms-for-enterprise-code-repositories-using-semantic-scopes" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"Automated Customization of LLMs for Enterprise Code Reposito\"?", "acceptedAnswer": { "@type": "Answer", "text": "Auto-customized LLMs for efficient and precise code completion in proprietary repositories." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "The product would be a plugin for code editors that integrates seamlessly into enterprise environments, automatically aligning code completions with the company's existing code base style and conventions." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop an enterprise tool that automatically customizes pre-trained language models to enhance code completion features within proprietary software repositories, reducing development time and increasing accuracy." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "This solution could replace traditional manual tuning practices and generic code completion plugins that do not cater to individual code base styles, providing a more tailored and efficient approach." } } ] } ] }