139 papers - avg viability 6.2
Current research in computer vision is increasingly focused on enhancing the robustness and efficiency of models through multimodal approaches and innovative architectures. Recent work on road surface classification illustrates this trend by integrating image and inertial data to improve performance across diverse environmental conditions, addressing a critical need in predictive maintenance systems. Similarly, cross-modal learning techniques are being applied to ship re-identification and plankton recognition, leveraging the strengths of different sensing modalities while minimizing the reliance on extensive labeled datasets. The introduction of rotation equivariant architectures is also notable, as it aims to bolster model resilience against geometric transformations, which is essential for real-world applications. Moreover, advancements in real-time segmentation and aberration correction highlight the push towards practical, deployment-ready solutions that can operate efficiently on edge devices. Collectively, these developments indicate a shift towards more adaptable, data-efficient systems capable of tackling complex, real-world challenges in various domains.
SDF-Net uses a structure-aware network to enhance cross-modal ship re-identification between optical and SAR imagery.
A robust framework for road surface classification using a new multimodal dataset that enhances predictive maintenance via camera-IMU fusion.
A revolutionary warping-based stereo matching solution that outperforms existing methods in accuracy and speed.
PicoSAM3 is a lightweight, real-time visual segmentation model optimized for edge devices, enabling efficient on-device processing.
A self-supervised cross-modal approach for efficient plankton recognition using minimal labeled data.
OSGeo revolutionizes cross-view object geo-localization by using Rotated Bounding Boxes for high precision with lower annotation costs.
OpenCap Monocular turns any smartphone into a 3D movement analytics tool for musculoskeletal insights.
Point-to-Mask revolutionizes infrared small target detection by transforming low-cost point annotations into accurate mask-level detections.
AlphaFace offers a real-time, high-fidelity face-swapping tool robust to diverse facial poses, outperforming current solutions in accuracy and speed.
PanoAffordanceNet enables holistic affordance grounding in 360° indoor environments, enhancing scene-level perception for embodied agents.