ARXIV:2604.17966 · LLM EVALUATION · SUBMITTED 21 APR · 02:39 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering

Jinglai Zheng · Chuhan Qiao · Haiming Huang · arXiv

TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods. In hypersonic thermal protection system (TPS) design, inaccurate stagnation-point heat flux or boundary-layer calculations…

METHOD

Full abstract

Deploying LLMs as reasoning assistants in safety-critical aerospace engineering requires stricter evaluation criteria than general scientific benchmarks. In hypersonic thermal protection system (TPS) design, inaccurate stagnation-point heat flux or boundary-layer calculations may cause catastrophic design margin violations. Models with numerically reasonable but physically invalid answers are more dangerous than those declining to respond. Current scientific benchmarks only test abstract math and basic physics, evaluate final answers solely, ignore engineering reasoning processes, and cannot detect such critical failures. We propose TPS-CalcBench, the first diagnostic benchmark for closed-form analytical calculations in hypersonic aerodynamics and high-temperature gas dynamics that experienced TPS engineers conduct without simulations. Our contributions include domain-oriented task taxonomy with 4 difficulty levels and 8 categories from Anderson's textbook, dual-track evaluation measuring result accuracy and reasoning quality via an 8-dimension rubric and calibrated judge with human audit to identify right answer wrong reasoning issues, human-AI data pipeline producing 420 high-confidence core items and 810 noise-controlled pre-gating items from 4560 raw data, noise-sensitivity analysis measuring data quality impacts on model ranking, and three diagnostic intervention methods: DFA-TPS fine-tuning, RAG-EQ retrieval grounding and PA-CoT process-aware prompting. Tests on 13 models from 7 groups show wide performance differences (KPI 12.6-87.9), hidden formula selection defects, data-driven rank changes and effective intervention improvements, establishing a complete diagnose-evaluate-intervene framework for safety-critical engineering LLM deployment assessment.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our contributions include domain-oriented task taxonomy with 4 difficulty levels and 8 categories from Anderson's textbook, dual-track evaluation measuring result accuracy and reasoning quality…

WHY NOW

LLM Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainTPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "aa27ac48-f9c7-4c61-9af2-402ae8175e20", "arxiv_id": "2604.17966", "canonical_route": "/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "endpoints": { "paper_pack": "/api/v1/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th/paper-pack", "build_passport": "/api/v1/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering", "normalized_query": "2604.17966", "route": "/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "paper_ref": "tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th#webpage", "url": "https://sciencetostartup.com/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "name": "TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering", "description": "TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th#scholarlyArticle", "headline": "TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering", "description": "TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.", "url": "https://sciencetostartup.com/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th", "sameAs": "https://arxiv.org/abs/2604.17966", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.17966" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T08:46:49.000Z", "author": [ { "@type": "Person", "name": "Jinglai Zheng" }, { "@type": "Person", "name": "Chuhan Qiao" }, { "@type": "Person", "name": "Haiming Huang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Evaluation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framewo", "item": "https://sciencetostartup.com/paper/tps-calcbench-a-benchmark-and-diagnostic-evaluation-framework-for-llm-analytical-calculation-competence-in-hypersonic-th" } ] } ] }

Competitive landscape

TPS-CalcBench is a diagnostic benchmark and evaluation framework for LLM analytical calculation competence in safety-critical aerospace engineering, including intervention methods.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering

TPS-CalcBench: A Benchmark and Diagnostic Evaluation Framework for LLM Analytical Calculation Competence in Hypersonic Thermal Protection System Engineering

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline