ARXIV:2605.21453 · AI FOR SOFTWARE ENGINEERING · SUBMITTED 21 MAY · 20:31 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

Mohamed Almukhtar · Anwar Ghammam · Hua Ming · arXiv

Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

Ship in 2-4 weeks›Score6.0Evidence unverified

Opportunity summary

Pain Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms. It remains unclear how agent-authored refactoring edits affect maintainability, code quality, and security…

METHOD

Full abstract

As AI agents increasingly contribute to code development and maintenance, there is still limited empirical evidence on the quality and risk characteristics of their changes in real-world projects, particularly for refactoring-oriented contributions. It remains unclear how agent-authored refactoring edits affect maintainability, code quality, and security once merged into GitHub repositories. To address this gap, we conduct an empirical study of Python refactoring pull requests (PRs) from the AIDev dataset. We analyze agentic refactoring PRs using PyQu, an ML-based quality assessment tool for Python, to quantify changes across five quality attributes, and we complement PyQu with domain-independent static analysis (Pylint and Bandit) to measure code quality and security issues before and after each change. Our results show that, on average, agentic commits improve a quality attribute in 22.5% of the studied changes, with usability improving most frequently (36.5%). At the same time, 24.17% of modified files introduce new Pylint issues predominantly convention level violations such as long lines-while 4.7% introduce new Bandit findings. From the observed diffs, we derive a taxonomy of 24 recurring change operations and map them to the lint and security findings they most commonly affect. Despite these mixed outcomes, developer acceptance is high: 73.5% of the analyzed PRs are merged, including cases that introduce new lint or security findings, often alongside the removal of existing issues. Overall, these findings highlight both the promise and current limitations of agentic refactoring, and motivate stronger tool-in-the-loop quality and security gating for AI-driven development workflows.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. Our results show that, on average, agentic commits improve a quality attribute in 22.5% of the studied changes, with usability improving most frequently (36.5%).…

WHY NOW

AI for Software Engineering moved forward this cycle; last verified May 2026. Public score 6.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainAnalyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

Segment

AI for Software Engineering

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "dc8b076c-7c4c-4376-ae54-2b8aac7e64f6", "arxiv_id": "2605.21453", "canonical_route": "/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "endpoints": { "paper_pack": "/api/v1/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests/paper-pack", "build_passport": "/api/v1/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Quality and Security Signals in AI-Generated Python Refactoring Pull Requests", "normalized_query": "2605.21453", "route": "/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "paper_ref": "quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests#webpage", "url": "https://sciencetostartup.com/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "name": "Quality and Security Signals in AI-Generated Python Refactoring Pull Requests", "description": "Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests#scholarlyArticle", "headline": "Quality and Security Signals in AI-Generated Python Refactoring Pull Requests", "description": "Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.", "url": "https://sciencetostartup.com/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests", "sameAs": "https://arxiv.org/abs/2605.21453", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.21453" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-20T17:43:36.000Z", "author": [ { "@type": "Person", "name": "Mohamed Almukhtar" }, { "@type": "Person", "name": "Anwar Ghammam" }, { "@type": "Person", "name": "Hua Ming" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 6 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI for Software Engineering" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI for Software Engineering", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Quality and Security Signals in AI-Generated Python Refactor", "item": "https://sciencetostartup.com/paper/quality-and-security-signals-in-ai-generated-python-refactoring-pull-requests" } ] } ] }

Competitive landscape

Analyzes AI-generated Python refactoring pull requests to assess code quality and security impacts, revealing mixed outcomes and motivating better gating mechanisms.

Segment

AI for Software Engineering

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline