ARXIV:2603.15916 · AUTOML · SUBMITTED 18 MAR · 22:54 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: partial proof status

Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

arXiv

A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

Blocked on Code›Score7.0Evidence partial

Opportunity summary

Pain A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

Evidence 0 refs | 0 sources | 50% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search. We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5…

METHOD

Full abstract

When LLM agents autonomously design ML experiments, do they perform genuine architecture search -- or do they default to hyperparameter tuning within a narrow region of the design space? We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5 Pro) across a combinatorial configuration space of 108,000 discrete cells for dashcam collision detection over 27 days. Through ANOVA decomposition, we find that \textbf{architectural choices explain 94\% of performance variance} ($F = 1324$, $η^2 = 0.94$), while hyperparameter variation within a fixed architecture explains only 6\%. Cross-task validation on a second collision dataset confirms this finding (75\% architecture-explained variance) with a \emph{different} winning backbone, confirming genuine architecture discovery. The agents' key contribution is discovering that V-JEPA\,2 video features with Zipformer temporal encoders achieve 0.9245 AP -- a configuration no human proposed -- and concentrating search on productive architectural regions: at $N = 50$, LLM-guided search reaches AP $= 0.985$ versus $0.965$ for from-scratch random search. Post-bugfix convergence follows a power law ($c = 0.11$, $R^2 = 0.93$); the low exponent reflects the cost of broad exploration, not inefficiency, since the LLM discovers qualitatively better regions than random or Bayesian baselines. We characterize multi-agent search dynamics via entropy cycles and Jensen--Shannon specialization, providing the first large-scale empirical framework for LLM-guided combinatorial ML experiment design.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. The agents' key contribution is discovering that V-JEPA\,2 video features with Zipformer temporal encoders achieve 0.9245 AP -- a configuration no human proposed --…

WHY NOW

AutoML moved forward this cycle; last verified April 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

Evidence0 refs | 0 sources | 50% coverage

Blockermissing authors

Analysis summary

A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: partial proof status

Competitive landscape

A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

Segment

AutoML

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "239ebe85-e591-482c-b7ca-aa61bac624e1", "arxiv_id": "2603.15916", "canonical_route": "/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "endpoints": { "paper_pack": "/api/v1/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments/paper-pack", "build_passport": "/api/v1/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments", "normalized_query": "2603.15916", "route": "/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "paper_ref": "auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments#webpage", "url": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "name": "Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments", "description": "A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments#scholarlyArticle", "headline": "Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments", "description": "A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.", "url": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments", "sameAs": "https://arxiv.org/abs/2603.15916", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.15916" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T21:05:39.000Z", "codeRepository": "https://github.com/warlockee/orze", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AutoML" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments#software", "name": "Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments - Source Code", "description": "A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.", "codeRepository": "https://github.com/warlockee/orze", "url": "https://github.com/warlockee/orze" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AutoML", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Auto Researching, not hyperparameter tuning: Convergence Ana", "item": "https://sciencetostartup.com/paper/auto-researching-not-hyperparameter-tuning-convergence-analysis-of-10-000-experiments" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Why now: The timing is ripe due to the proliferation of video data in autonomous vehicles and surveillance, combined with advances in LLM agents capable of combinatorial search. Market conditions include rising demand for real-time, accurate AI models in safety-critical industries and increasing compute costs, making efficiency gains from automated architecture discovery commercially urgent." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "A commercial use case is an automated ML architecture search platform for dashcam and autonomous vehicle companies, where the system continuously experiments with video feature extractors and temporal encoders to optimize collision detection models, achieving higher accuracy (e.g., 0.9245 AP) than human-designed baselines, reducing development cycles from months to weeks." } } ] } ] }

Competitive landscape

A framework for LLM agents to autonomously design and optimize ML experiments through genuine architecture search.

Segment

AutoML

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline