DrivoR

Gold definitionUpdated Apr 2, 2026

DrivoR is a novel, simple, and efficient transformer-based architecture designed for end-to-end autonomous driving systems. It leverages pretrained Vision Transformers (ViTs) as its foundation. The core mechanism involves introducing specialized "camera-aware register tokens" that effectively compress complex multi-camera sensor data into a compact, unified scene representation. This compressed representation then feeds into two lightweight transformer decoders: one for generating candidate driving trajectories and another for scoring them. The scoring decoder is particularly innovative, as it learns to mimic an oracle, providing interpretable sub-scores for critical driving aspects like safety, comfort, and efficiency, which enables adaptive, behavior-conditioned driving during inference. DrivoR addresses the critical problem of achieving high accuracy in autonomous driving while significantly reducing computational overhead, making it suitable for real-world deployment. It is relevant to researchers and ML engineers working on perception, planning, and control in autonomous vehicles and robotics.

DrivoR's Core Architecture

Transformer-Based Foundation: DrivoR is built upon a pure-transformer architecture, specifically leveraging pretrained Vision Transformers (ViTs). This foundation enables robust feature extraction from visual inputs, a common approach in modern perception systems for autonomous driving.
Multi-Camera Feature Compression: A key innovation in DrivoR is the use of camera-aware register tokens. These tokens are designed to efficiently compress features from multiple camera inputs into a compact scene representation, which is crucial for managing the high dimensionality of sensor data in autonomous vehicles.

DrivoR's Trajectory Generation and Scoring

Dual Decoder System: The compact scene representation generated by the register tokens drives two lightweight transformer decoders. One decoder is responsible for generating a set of candidate driving trajectories, while the second decoder focuses on evaluating these proposed paths.
Oracle-Mimicking Scoring

At a glance

Executive summary

DrivoR is a new AI system for self-driving cars that uses a special type of neural network called a transformer. It's designed to be very efficient by compressing camera data, and it can generate and evaluate driving paths, even explaining why it chooses a certain path based on factors like safety and comfort.

TL;DR

DrivoR is an efficient AI system for self-driving cars that uses transformers to process camera data and intelligently plan driving paths, even explaining its choices.

Key points

Uses camera-aware register tokens to compress multi-camera features, feeding into dual transformer decoders for trajectory generation and oracle-mimicking scoring.
Achieves accurate and adaptive end-to-end autonomous driving with significantly reduced computational overhead, making deployment more feasible.
Used by researchers and engineers developing autonomous vehicles, robotics, and intelligent transportation systems.
Unlike traditional modular approaches that separate perception, planning, and control, DrivoR offers an end-to-end pure-transformer solution, integrating these aspects.
Focus on efficient, end-to-end transformer architectures and interpretable AI for safety-critical applications like autonomous driving.

Use cases

Autonomous Passenger Vehicles: Enabling safer and more efficient self-driving cars by providing interpretable decision-making and reduced computational load for real-time operation.
Logistics and Delivery Robots: Deploying autonomous capabilities in delivery vans or last-mile robots where computational resources might be constrained, but reliable navigation is critical.
Industrial Autonomous Vehicles: Guiding forklifts or transport robots in factories and warehouses, benefiting from adaptive behavior (e.g., prioritizing efficiency on open paths, safety in crowded areas).
Simulation and Testing Platforms: Serving as a robust and efficient baseline for evaluating new autonomous driving scenarios and algorithms within high-fidelity simulators like HUGSIM.