ARXIV:2605.02860 · CODE CLONE DETECTION · SUBMITTED 05 MAY · 20:29 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

Mohamad Khajezade · Fatemeh H. Fard · Mohamed Sami Shehata · arXiv

Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection. Although large language models (LLMs) have shown promise for semantic clone detection, their use as…

METHOD

Full abstract

Cross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, reproducibility, privacy, and unreliable output formatting. In particular, compact open-source models often struggle to follow reasoning-oriented prompts and to produce outputs that can be consistently mapped to binary clone labels. To address these limitations, we propose a knowledge distillation framework that transfers reasoning capabilities from DeepSeek-R1 into compact open-source student models for X-CCD. Using cross-language code pairs derived from Project CodeNet, we construct reasoning-oriented synthetic training data and fine-tune Phi3 and Qwen-Coder with LoRA adapters. We further introduce response stabilization methods, including forced conclusion prompting, a binary classification head, and a contrastive classification head, and evaluate model behavior using both predictive metrics and response rate. Experiments on Python--Java, Rust--Java, Rust--Python, and Rust--Ruby show that knowledge distillation consistently improves the reliability of compact models and often improves predictive performance, especially under distribution shift. In addition, classification-head variants substantially reduce inference time compared to generation-based inference. Overall, our results show that reasoning-oriented distillation combined with response stabilization makes compact open-source models more practical and reliable for X-CCD detection.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. Experiments on Python--Java, Rust--Java, Rust--Python, and Rust--Ruby show that knowledge distillation consistently improves the reliability of compact models and often improves predictive performance, especially…

WHY NOW

Code Clone Detection moved forward this cycle; last verified May 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainKnowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

Segment

Code Clone Detection

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "534edbb1-c570-414d-8336-6a2523f47e94", "arxiv_id": "2605.02860", "canonical_route": "/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "endpoints": { "paper_pack": "/api/v1/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection/paper-pack", "build_passport": "/api/v1/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection", "normalized_query": "2605.02860", "route": "/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "paper_ref": "standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection#webpage", "url": "https://sciencetostartup.com/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "name": "Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection", "description": "Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection#scholarlyArticle", "headline": "Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection", "description": "Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.", "url": "https://sciencetostartup.com/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection", "sameAs": "https://arxiv.org/abs/2605.02860", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.02860" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-04T17:37:16.000Z", "author": [ { "@type": "Person", "name": "Mohamad Khajezade" }, { "@type": "Person", "name": "Fatemeh H. Fard" }, { "@type": "Person", "name": "Mohamed Sami Shehata" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Code Clone Detection" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Code Clone Detection", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Standing on the Shoulders of Giants: Stabilized Knowledge Di", "item": "https://sciencetostartup.com/paper/standing-on-the-shoulders-of-giants-stabilized-knowledge-distillation-for-cross-language-code-clone-detection" } ] } ] }

Competitive landscape

Knowledge distillation with response stabilization techniques makes compact open-source models more practical and reliable for cross-language code clone detection.

Segment

Code Clone Detection

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline