232 papers - avg viability 6.1
Computer vision is advancing rapidly, enabling machines to interpret and understand visual information from the world. Recent developments include adaptive zoom-in techniques for GUI grounding, cross-modal learning for ship re-identification, and efficient algorithms for real-time segmentation. These innovations enhance accuracy and robustness in various applications, such as autonomous vehicles, healthcare, and augmented reality. By improving the ability to analyze images and videos, computer vision technologies are becoming essential for builders looking to integrate visual data processing into their products. This progress not only streamlines processes but also opens new avenues for automation and intelligent systems, making it a critical area for research and commercialization.
SDF-Net uses a structure-aware network to enhance cross-modal ship re-identification between optical and SAR imagery.
A robust framework for road surface classification using a new multimodal dataset that enhances predictive maintenance via camera-IMU fusion.
Develop an uncertainty-driven adaptive zoom-in tool for more accurate GUI element localization in screenshots.
A revolutionary warping-based stereo matching solution that outperforms existing methods in accuracy and speed.
A framework for projector compensation that generalizes to unseen setups without retraining, enabled by a large dataset and a novel co-adaptive geometry and photometry correction approach.
Point-to-Mask revolutionizes infrared small target detection by transforming low-cost point annotations into accurate mask-level detections.
OpenCap Monocular turns any smartphone into a 3D movement analytics tool for musculoskeletal insights.
SAGA-ReID reconstructs person identity representations by aligning intermediate patch tokens with CLIP's text embedding space, significantly improving performance in occluded scenarios.
A compact and efficient CNN model for plant disease detection with a user-friendly desktop application for edge deployment.
A self-supervised cross-modal approach for efficient plankton recognition using minimal labeled data.