ARXIV:2603.28422 · ROBOTICS · SUBMITTED 31 MAR · 20:53 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

Robin Kühn · Moritz Schappler · Thomas Seel · Dennis Bank · arXiv

This research demonstrates that a minimal active stereo-camera setup significantly outperforms complex multi-sensor configurations for humanoid robot imitation learning, offering a more efficient and robust approach to task acquisition.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain This research demonstrates that a minimal active stereo-camera setup significantly outperforms complex multi-sensor configurations for humanoid robot imitation learning, offering a more efficient and robust approach to task acquisition.

Evidence 35 refs | 4 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

The complexity of teaching humanoid robots new tasks is one of the major reasons hindering their widespread adoption in the industry. While Imitation Learning (IL), particularly Action Chunking with Transformers (ACT), enables rapid task acquisition, there is no consensus yet on the optimal sensory hardware required for manipulation tasks. This paper benchmarks 14 sensor combinations on the Unitree G1 humanoid robot equipped with three-finger hands for two manipulation tasks. We explicitly evaluate the integration of tactile and proprioceptive modalities alongside active vision. Our analysis demonstrates that strategic sensor selection can outperform complex configurations in data-limited regimes while reducing computational overhead. We develop an open-source Unified Ablation Framework that utilizes sensor masking on a comprehensive master dataset. Results indicate that additional modalities often degrade performance for IL with limited data. A minimal active stereo-camera setup outperformed complex multi-sensor configurations, achieving 87.5% success in a spatial generalization task and 94.4% in a structured manipulation task. Conversely, adding pressure sensors to this setup reduced success to 67.3% in the latter task due to a low signal-to-noise ratio. We conclude that in data-limited regimes, active vision offers a superior trade-off between robustness and complexity. While tactile modalities may require larger datasets to be effective, our findings validate that strategic sensor selection is critical for designing an efficient learning process.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. While Imitation Learning (IL), particularly Action Chunking with Transformers (ACT), enables rapid task acquisition, there is no consensus yet on the optimal sensory hardware…

WHY NOW

Robotics moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research demonstrates that a minimal active stereo-camera setup significantly outperforms complex multi-sensor configurations for humanoid robot imitation learning, offering a more efficient and robust approach to task acquisition.

Evidence35 refs | 4 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

Robin Kühn · Moritz Schappler · Thomas Seel · Dennis Bank · arXiv

Competitive landscape

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "558bb357-6400-4a86-b686-33287a42a251", "arxiv_id": "2603.28422", "canonical_route": "/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "endpoints": { "paper_pack": "/api/v1/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation/paper-pack", "build_passport": "/api/v1/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation", "normalized_query": "2603.28422", "route": "/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "paper_ref": "active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation#webpage", "url": "https://sciencetostartup.com/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "name": "Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation", "description": "This research demonstrates that a minimal active stereo-camera setup significantly outperforms complex multi-sensor configurations for humanoid robot imitation learning, offering a more efficient and robust approach to task acquisition.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation#scholarlyArticle", "headline": "Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation", "description": "This research demonstrates that a minimal active stereo-camera setup significantly outperforms complex multi-sensor configurations for humanoid robot imitation learning, offering a more efficient and robust approach to task acquisition.", "url": "https://sciencetostartup.com/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation", "sameAs": "https://arxiv.org/abs/2603.28422", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.28422" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T13:30:34.000Z", "author": [ { "@type": "Person", "name": "Robin Kühn" }, { "@type": "Person", "name": "Moritz Schappler" }, { "@type": "Person", "name": "Thomas Seel" }, { "@type": "Person", "name": "Dennis Bank" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Robotics" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Robotics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT I", "item": "https://sciencetostartup.com/paper/active-stereo-camera-outperforms-multi-sensor-setup-in-act-imitation-learning-for-humanoid-manipulation" } ] } ] }

Competitive landscape

Segment

Robotics

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline