ARXIV:2605.14202 · LLM FOR SOFTWARE TESTING · SUBMITTED 15 MAY · 20:13 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla · Marco Vieira · arXiv

An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications. Robustness testing systematically exercises such inputs to expose server-side failures, but generating diverse, effective tests…

METHOD

Full abstract

Malformed, missing, or boundary-value inputs in microservice APIs can cascade across dependent services, threatening reliability. Robustness testing systematically exercises such inputs to expose server-side failures, but generating diverse, effective tests remains challenging. Large Language Models can generate such tests from API specifications; however, it is unknown whether different models and prompt strategies produce diverse failure sets or converge on the same failures. We report a controlled experiment applying 7 prompt strategies to 3 open-source LLMs (14B-70B parameters) targeting 2 architecturally distinct microservice systems: one Java monolingual (6 services, 9 failure modes) and one polyglot (27 services, 14 failure modes), yielding 38 valid runs and 663 generated tests. We find that prompt strategy explains more variation in diversity than model size: a Structured prompt collapses diversity entirely, while a single model varied across three prompt strategies achieves complete failure-mode coverage on one system, outperforming any multi-model ensemble under a fixed prompt. We introduce two strategies, Guided and GuidedFewShot, that embed a mutation taxonomy from prior robustness testing research as domain context. GuidedFewShot achieves the highest single-run coverage on both systems (5 of 9 and 8 of 14 failure modes) while maintaining low cross-model similarity. A key lesson is that taxonomy rules alone are insufficient: LLMs cannot distinguish key-absent from value-empty mutations without concrete examples. Findings replicate across both systems.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We find that prompt strategy explains more variation in diversity than model size: a Structured prompt collapses diversity entirely, while a single model varied…

WHY NOW

LLM for Software Testing moved forward this cycle; last verified May 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainAn empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

Segment

LLM for Software Testing

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d2b82370-4176-4113-9dd4-1e9074108bcf", "arxiv_id": "2605.14202", "canonical_route": "/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "endpoints": { "paper_pack": "/api/v1/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study/paper-pack", "build_passport": "/api/v1/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "LLM-Based Robustness Testing of Microservice Applications: An Empirical Study", "normalized_query": "2605.14202", "route": "/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "paper_ref": "llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study#webpage", "url": "https://sciencetostartup.com/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "name": "LLM-Based Robustness Testing of Microservice Applications: An Empirical Study", "description": "An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study#scholarlyArticle", "headline": "LLM-Based Robustness Testing of Microservice Applications: An Empirical Study", "description": "An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.", "url": "https://sciencetostartup.com/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study", "sameAs": "https://arxiv.org/abs/2605.14202", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.14202" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-13T23:36:41.000Z", "author": [ { "@type": "Person", "name": "Hrushitha Goud Tigulla" }, { "@type": "Person", "name": "Marco Vieira" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM for Software Testing" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM for Software Testing", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "LLM-Based Robustness Testing of Microservice Applications: A", "item": "https://sciencetostartup.com/paper/llm-based-robustness-testing-of-microservice-applications-an-empirical-study" } ] } ] }

Competitive landscape

An empirical study investigating the effectiveness of different LLM prompt strategies for generating robust test cases for microservice applications.

Segment

LLM for Software Testing

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline