DrivoR is a novel, simple, and efficient transformer-based architecture designed for end-to-end autonomous driving systems. It leverages pretrained Vision Transformers (ViTs) as its foundation. The core mechanism involves introducing specialized "camera-aware register tokens" that effectively compress complex multi-camera sensor data into a compact, unified scene representation. This compressed representation then feeds into two lightweight transformer decoders: one for generating candidate driving trajectories and another for scoring them. The scoring decoder is particularly innovative, as it learns to mimic an oracle, providing interpretable sub-scores for critical driving aspects like safety, comfort, and efficiency, which enables adaptive, behavior-conditioned driving during inference. DrivoR addresses the critical problem of achieving high accuracy in autonomous driving while significantly reducing computational overhead, making it suitable for real-world deployment. It is relevant to researchers and ML engineers working on perception, planning, and control in autonomous vehicles and robotics.
DrivoR is a new AI system for self-driving cars that uses a special type of neural network called a transformer. It's designed to be very efficient by compressing camera data, and it can generate and evaluate driving paths, even explaining why it chooses a certain path based on factors like safety and comfort.
Was this definition helpful?