Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids Build Now
VIRSO offers efficient, real-time virtual sensing for sparse data on edge devices using its novel graph neural operator technology.
AI Edge Computing Apr 2 Code High viability
Contextualizing Sink Knowledge for Java Vulnerability Discovery Build Now
GONDAR identifies and exploits Java vulnerabilities through a novel LLM-assisted fuzzing framework, significantly outperforming existing tools.
Security and Vulnerability Apr 2 Code High viability
Woosh: A Sound Effects Foundation Model Build Now
Harness Sony AI's 'Woosh' for groundbreaking, high-quality sound effects generation for multimedia solutions.
Audio Technology Apr 2 Pending High viability
PLUME: Latent Reasoning Based Universal Multimodal Embedding Build Now
PLUME enhances universal multimodal retrieval by embedding latent reasoning for faster, more efficient inference.
AI-based Retrieval Systems Apr 2 Code High viability
NearID: Identity Representation Learning via Near-identity Distractors Build Now
NearID offers a robust identity verification system that isolates identity signals for enhanced personalization and image editing.
Identity Verification Tech Apr 2 Pending High viability
GardenDesigner: Encoding Aesthetic Principles into Jiangnan Garden Construction via a Chain of Agents Build Now
An AI framework for automating the construction of Jiangnan gardens using aesthetic principles, facilitating virtual tourism and digital heritage preservation.
AI for Digital Design and Heritage Apr 2 Code High viability
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving Build Now
A unified vision-language-action system that enhances autonomous driving by decoupling spatial perception and semantic reasoning.
AI for Autonomous Vehicles Apr 2 Pending High viability
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models Build Now
Embodied UAV tracking system leveraging vision-language-action models for dynamic real-world scenarios.
Embodied UAV Tracking Apr 2 Pending High viability
HyVGGT-VO: Tightly Coupled Hybrid Dense Visual Odometry with Feed-Forward Models Build Now
HyVGGT-VO delivers real-time dense visual odometry using a hybrid framework for efficient 3D mapping and pose estimation.
Visual Odometry Enhancement Apr 2 Code High viability
Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method Build Now
Sven is a novel optimization algorithm that significantly outperforms Adam and LBFGS on regression tasks by efficiently approximating natural gradient descent, offering faster convergence and lower loss.
Optimization Algorithms Apr 1 Code High viability
Look Twice: Training-Free Evidence Highlighting in Multimodal Large Language Models Build Now
A training-free framework that enhances multimodal LLMs' ability to identify and utilize relevant visual and textual evidence for knowledge-intensive question answering.
Multimodal LLMs Apr 1 Code High viability
UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression Build Now
UQ-SHRED quantifies uncertainty in reconstructing complex spatiotemporal fields from sparse sensor data using a novel distributional regression approach.
Scientific AI Apr 1 Code High viability
M2-Verify: A Large-Scale Multidomain Benchmark for Checking Multimodal Claim Consistency Build Now
A large-scale, expert-validated multimodal dataset to benchmark and improve AI's ability to verify scientific claims against visual evidence, revealing significant gaps in current state-of-the-art models.
Multimodal AI Apr 1 Code High viability
Sparse Spectral LoRA: Routed Experts for Medical VLMs Build Now
A parameter-efficient medical vision-language model that achieves near full fine-tuning performance with significantly fewer trainable parameters and reduced catastrophic forgetting.
Medical AI Apr 1 Code High viability
Preference learning in shades of gray: Interpretable and bias-aware reward modeling for human preferences Build Now
A novel framework for interpretable and bias-aware reward modeling in LLMs that significantly improves preference learning accuracy.
LLM Alignment Apr 1 Code High viability
Detecting Complex Money Laundering Patterns with Incremental and Distributed Graph Modeling Build Now
A distributed graph modeling framework to detect complex money laundering patterns with reduced false positives, validated on real and synthetic datasets.
Fraud Detection Apr 1 Pending High viability
ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos Build Now
A Vision Transformer model and expanded dataset for early detection of risky tackles in American football, enabling coach-centered injury prevention.
Sports AI Apr 1 Code High viability
Human Pose Estimation in Trampoline Gymnastics: Improving Performance Using a New Synthetic Dataset Build Now
Improve human pose estimation for extreme sports by fine-tuning models on a novel synthetic dataset.
Computer Vision Apr 1 Code High viability
Model Merging via Data-Free Covariance Estimation Build Now
A data-free method to merge AI models, inheriting capabilities without needing original training data, outperforming existing techniques.
LLM Training Apr 1 Code High viability
Evolutionary Multi-Objective Fusion of Deepfake Speech Detectors Build Now
A framework that uses evolutionary algorithms to fuse deepfake speech detectors, achieving state-of-the-art accuracy with significantly reduced system complexity.
Deepfake Detection Apr 1 Code High viability
SECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous Driving Build Now
A framework for building highly robust and stable AI models for early collision prediction in autonomous driving, significantly reducing reliability risks.
Autonomous Driving Safety Apr 1 Code High viability
Regularizing Attention Scores with Bootstrapping Build Now
A novel bootstrapping method for vision transformers that quantifies attention score uncertainty, leading to more interpretable and sparse attention maps for image analysis.
Computer Vision Apr 1 Pending High viability
Massively Parallel Exact Inference for Hawkes Processes Build Now
A massively parallel PyTorch library for exact inference of multivariate Hawkes processes, enabling analysis at unprecedented scales.
Point Processes Apr 1 Code High viability
IDEA2: Expert-in-the-loop competency question elicitation for collaborative ontology engineering Build Now
Accelerate ontology engineering by using LLMs to semi-automatically elicit competency questions from domain experts through an iterative feedback loop.
Ontology Engineering Apr 1 Pending High viability
Procedural Knowledge at Scale Improves Reasoning Build Now
A retrieval-augmented generation framework that leverages a large corpus of procedural knowledge to significantly improve language model reasoning capabilities on complex tasks.
Reasoning Agents Apr 1 Code High viability
PI-JEPA: Label-Free Surrogate Pretraining for Coupled Multiphysics Simulation via Operator-Split Latent Prediction Build Now
A physics-informed AI framework that drastically reduces the need for expensive simulation data by pretraining on unlabeled physics parameters, enabling faster and more accurate multiphysics surrogate models.
Physics-Informed AI Apr 1 Code High viability
Open-Domain Safety Policy Construction Build Now
An agentic system that automatically drafts content moderation policies from minimal seed information, outperforming existing methods and expert-written policies.
AI Safety Policy Generation Apr 1 Pending High viability
IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation Build Now
Generate prototype images from text to enable zero-shot open-vocabulary semantic segmentation for 3D lidar data, outperforming existing methods.
3D Computer Vision Apr 1 Pending High viability
CogBias: Measuring and Mitigating Cognitive Bias in Large Language Models Build Now
This research quantifies and mitigates cognitive biases in LLMs by identifying and manipulating internal representations, offering a path to more reliable AI decision-making.
LLM Evaluation and Control Apr 1 Code High viability
AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction Build Now
AffordTissue predicts tool-action specific safe interaction regions on tissue for surgical automation, outperforming general vision-language models.
Surgical AI Apr 1 Code High viability
Can LLMs Predict Academic Collaboration? Topology Heuristics vs. LLM-Based Link Prediction on Real Co-authorship Networks Build Now
Leveraging LLMs to predict future academic collaborations by analyzing author profiles, outperforming traditional methods and identifying novel connections.
Academic Collaboration Prediction Apr 1 Code High viability
GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization Build Now
A training-free AI pipeline for precise First Point of Contact localization in American football practice videos, enabling biomechanical analysis without labeled data.
Sports AI Apr 1 Pending High viability
LESV: Language Embedded Sparse Voxel Fusion for Open-Vocabulary 3D Scene Understanding Build Now
A novel framework for open-vocabulary 3D scene understanding that uses sparse voxel rasterization and foundation models to overcome spatial and semantic ambiguities, achieving state-of-the-art performance.
3D Scene Understanding Apr 1 Code High viability
Adaptive Stopping for Multi-Turn LLM Reasoning Build Now
A conformal prediction framework for multi-turn LLM reasoning that guarantees accuracy while reducing cost and latency.
LLM Reasoning Apr 1 Code High viability
Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation Build Now
An adaptive fusion strategy for vision and torque sensors in robotic manipulation that significantly improves success rates in contact-rich tasks.
Robotic Manipulation Apr 1 Code High viability
Cost-Efficient Estimation of General Abilities Across Benchmarks Build Now
Develop a cost-efficient LLM benchmarking tool that predicts model performance on unseen tasks with 85% cost reduction.
LLM Evaluation Apr 1 Code High viability
EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation Build Now
EgoFlow generates physically consistent 6DoF object trajectories from egocentric video by combining a Mamba-Transformer-Perceiver architecture with gradient-guided flow matching, outperforming existing methods in accuracy and realism.
Egocentric Motion Generation Apr 1 Code High viability
ClawSafety: "Safe" LLMs, Unsafe Agents Build Now
A new benchmark and evaluation framework to rigorously test the safety of AI agents operating with elevated privileges in realistic professional environments.
AI Agents Apr 1 Code High viability
Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models Build Now
A specialized guardrail model and benchmark to mitigate food safety risks in large language models.
LLM Safety Apr 1 Code High viability
A Multi-Agent Human-LLM Collaborative Framework for Closed-Loop Scientific Literature Summarization Build Now
A multi-agent, human-in-the-loop framework that leverages LLMs and structured AI to accelerate scientific discovery by automating literature analysis and insight extraction.
AI Agents for Scientific Research Apr 1 Code High viability
Infeasibility Aware Large Language Models for Combinatorial Optimization Build Now
Fine-tune LLMs to solve complex optimization problems by detecting infeasibility and accelerating search, outperforming existing models.
Combinatorial Optimization with LLMs Apr 1 Code High viability
Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs Build Now
A tool to detect and mitigate overconfidence in LLM responses by identifying and intervening on specific internal circuits.
LLM Interpretability Apr 1 Code High viability
Reinforcing Consistency in Video MLLMs with Structured Rewards Build Now
This research introduces a structured reward system for video multimodal large language models to improve factual and temporal grounding, reducing hallucinations and enhancing faithfulness in video understanding.
Video MLLMs Apr 1 Code High viability
Reducing Hallucinations in LLM-based Scientific Literature Analysis Using Peer Context Outlier Detection Build Now
A novel method to reduce LLM hallucinations in scientific literature analysis by leveraging peer document context, improving data extraction accuracy and streamlining research workflows.
LLM Hallucination Reduction Apr 1 Code High viability
Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis Watch
Personalize assistive robots using natural language feedback for users with paralysis, reducing user fatigue and ensuring safety.
Assistive Robotics Apr 1 High viability
Efficient Equivariant Transformer for Self-Driving Agent Modeling Build Now
A novel transformer architecture for self-driving that models agent behaviors with SE(2)-equivariance at a reduced computational cost.
Self-Driving Agent Modeling Apr 1 Code High viability
SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits Build Now
A novel, low-latency LLM guardrail that uses token-level logits to detect jailbreaks with significantly reduced resource overhead.
LLM Security Apr 1 Code High viability
Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation Build Now
Adapt powerful closed-box AI models like GPT-4o efficiently and affordably by priming a local proxy, drastically reducing API calls and improving performance.
LLM Adaptation Apr 1 Code High viability
A Self-Evolving Agentic Framework for Metasurface Inverse Design Build Now
An AI agent that autonomously learns and refines workflows for complex optical device design, reducing the need for specialized expertise.
AI for Scientific Discovery Apr 1 Code High viability
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation Build Now
A unified framework for 3D reconstruction and generation from sparse views, improving fidelity and completeness.
3D Reconstruction Apr 1 Code High viability
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data Build Now
A hierarchical reinforcement learning framework for generating high-fidelity, privacy-preserving synthetic clinical data that significantly improves downstream classifier performance.
Synthetic Data Generation Apr 1 Code High viability
Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving Build Now
A formal verification platform for AI agents in finance, ensuring mathematically verifiable compliance with regulatory mandates.
AI Compliance Apr 1 Pending High viability
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe Build Now
An LLM-based agentic framework that automates the generation and optimization of high-performance GPU kernels through iterative refinement and execution-based validation.
GPU Kernel Generation Apr 1 Code High viability
From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents Build Now
A two-stage fine-tuning recipe that significantly improves the code generation and execution capabilities of open-weight LLMs for software engineering tasks.
Software Engineering Agents Apr 2 Code High viability
Beyond Logit Adjustment: A Residual Decomposition Framework for Long-Tailed Reranking Build Now
A lightweight post-hoc reranker that decomposes and corrects for biases in long-tailed classification, improving accuracy on rare classes.
Long-Tailed Classification Apr 2 Code High viability
ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems Build Now
A deterministic benchmark and dataset to rigorously evaluate and improve the reliability of AI agents in handling tool misuse and recovery.
Agentic Systems Apr 2 Code High viability
Robust Autonomous Control of a Magnetic Millirobot in In Vitro Cardiac Flow Build Now
Develops a vision-guided control system for autonomous navigation of magnetic millirobots in cardiac flow for targeted drug delivery.
Medical Robotics Apr 2 Code High viability
Learning ECG Image Representations via Dual Physiological-Aware Alignments Build Now
Unlock legacy ECG image data for automated cardiovascular diagnostics with a self-supervised framework that bridges the performance gap with signal-based analysis.
Medical AI Apr 2 Code High viability
ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents Build Now
A production-derived benchmark for evaluating AI coding agents, enabling more realistic performance assessment and driving improvements in agent capabilities.
AI Coding Agents Apr 2 Code High viability
A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies Build Now
A role-based LLM framework automates structured information extraction from complex healthy food policies, overcoming common LLM limitations like hallucinations and misclassifications.
Information Extraction Apr 2 Code High viability
PHMForge: A Scenario-Driven Agentic Benchmark for Industrial Asset Lifecycle Maintenance Build Now
A benchmark for evaluating LLM agents in industrial maintenance tasks, revealing significant gaps in current capabilities.
Industrial AI Agents Apr 2 Code High viability
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging Build Now
A model merging framework adapts large language models for clinical applications by preserving instruction-following ability and domain expertise, offering a scalable solution for resource-constrained healthcare.
LLM Adaptation Apr 2 Code High viability
Universal computational thermal imaging overcoming the ghosting effect Build Now
A universal computational thermal imaging framework that overcomes ghosting for high-fidelity night vision, enabling applications from autonomous navigation to healthcare.
Computer Vision Apr 2 Code High viability
Prototype-Based Low Altitude UAV Semantic Segmentation Build Now
An efficient prototype-based semantic segmentation framework for low-altitude UAV imagery that balances performance and computational efficiency.
Computer Vision Apr 2 Pending High viability
ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor Build Now
ZEUS accelerates diffusion model inference by up to 3.2x using a novel second-order predictor without architectural changes or training, maintaining perceptual quality.
Generative Models Acceleration Apr 2 Pending High viability
Cross-Domain Vessel Segmentation via Latent Similarity Mining and Iterative Co-Optimization Build Now
A novel domain transfer framework for retinal vessel segmentation that achieves state-of-the-art performance by leveraging latent vascular similarity and iterative co-optimization.
Medical AI Apr 2 Code High viability
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning Build Now
DeltaMem is an agentic memory management system that uses reinforcement learning to significantly improve persona memory performance in conversational AI, outperforming existing product-level baselines.
Agentic Memory Management Apr 2 Code High viability
ReFlow: Self-correction Motion Learning for Dynamic Scene Reconstruction Build Now
A unified framework for monocular dynamic scene reconstruction that learns 3D motion through self-correction, improving stability and accuracy.
3D Reconstruction Apr 2 Code High viability
AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation Build Now
A diffusion-based policy for mobile manipulation that enables efficient, reactive, and multimodal action generation with self-correction.
Robotics Apr 2 Pending High viability
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification Build Now
A new benchmark and evaluation framework to rigorously test and improve the grounded spatio-temporal reasoning capabilities of video multimodal large language models.
Video LLMs Apr 2 Code High viability
Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents Build Now
Develops a novel framework for supportive AI dialogue agents that prioritizes user autonomy while maintaining helpfulness, addressing relational risks like dependency and coercion.
Dialogue Agents Apr 2 Code High viability
Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning Build Now
A novel learning paradigm that aligns gradients between tabular and image data to improve multimodal fusion performance.
Multimodal Fusion Apr 2 Pending High viability
SHOE: Semantic HOI Open-Vocabulary Evaluation Metric Build Now
A new semantic evaluation metric for open-vocabulary human-object interaction detection that aligns better with human judgment, enabling more scalable and flexible assessment of AI models.
Computer Vision Evaluation Apr 2 Code High viability
NED-Tree: Bridging the Semantic Gap with Nonlinear Element Decomposition Tree for LLM Nonlinear Optimization Modeling Build Now
A framework that enables LLMs to accurately model complex nonlinear optimization problems by decomposing them into solver-compatible elements, with a new benchmark to drive progress.
LLM Optimization Apr 2 Code High viability
Mitigating the ID-OOD Tradeoff in Open-Set Test-Time Adaptation Build Now
A robust open-set test-time adaptation method that mitigates the ID-OOD tradeoff for improved model reliability in shifting environments.
Open-Set Test-Time Adaptation Apr 2 Code High viability
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Build Now
A two-phase framework that jointly trains LLMs to solve reasoning problems and refine their own answers, significantly improving accuracy on mathematical benchmarks.
LLM Reasoning and Refinement Apr 2 Code High viability
Optimizing EEG Graph Structure for Seizure Detection: An Information Bottleneck and Self-Supervised Learning Approach Build Now
A novel AI approach that learns denoised EEG graph structures and informative representations for improved seizure detection, offering clinically meaningful insights.
Medical AI Apr 2 Pending High viability
Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training Build Now
Accelerate LLM training and reduce unfaithful reasoning by intelligently filtering training data using influence scores.
LLM Post-Training Apr 2 Code High viability
Riemannian and Symplectic Geometry for Hierarchical Text-Driven Place Recognition Build Now
A novel framework for precise robot localization using text descriptions by leveraging hierarchical geometric alignments and outperforming state-of-the-art by 19%.
Robotics Localization Apr 2 Code High viability
SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing Build Now
SteerFlow enables high-fidelity, model-agnostic image editing by steering generative flow trajectories to preserve source details and achieve complex multi-turn edits.
Generative Image Editing Apr 2 Pending High viability
ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models Build Now
A compact, reasoning-driven AI model for grounded claim verification that achieves state-of-the-art accuracy with significantly fewer parameters.
LLM Reasoning and Verification Apr 2 Code High viability
Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP Build Now
Leveraging synthetic EEG data and a novel cognitive energy metric to build adaptive human-machine systems that respond in real-time to user cognitive states.
Neuroadaptive Systems Apr 2 Code High viability
End-to-End Shared Attention Estimation via Group Detection with Feedback Refinement Build Now
An end-to-end system for estimating shared attention by simultaneously detecting groups of people and their focus points, outperforming existing methods.
Computer Vision Apr 2 Pending High viability
Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression Build Now
Swift-SVD offers a theoretically optimal and practically efficient method for compressing Large Language Models, achieving significant speedups in compression time.
LLM Compression Apr 2 Code High viability
MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction Build Now
A multimodal model that generates accurate and executable code from charts using reinforcement learning and self-correction.
Multimodal Coding Apr 2 Code High viability
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning Build Now
CRIT provides a novel dataset and benchmark for training Vision-Language Models to perform complex, multi-hop reasoning across text and visual information, addressing hallucination and improving grounding.
Multimodal Reasoning Apr 2 Code High viability
Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations Build Now
A diffusion-guided defense system that injects adversarial perturbations into latent space to shield facial identities from GAN and diffusion-based deepfakes, offering robust protection in both white-box and black-box scenarios.
Adversarial Defense Apr 2 Code High viability
Label Shift Estimation With Incremental Prior Update Build Now
A post-hoc label shift estimation method that incrementally updates priors for any black-box probabilistic classifier, outperforming state-of-the-art.
Machine Learning Apr 2 Code High viability
Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving Build Now
A novel approach to integrate diverse textual inputs for autonomous driving, improving driving performance and safety through causal scene narration and runtime supervision.
Autonomous Driving Apr 2 Code High viability
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models Build Now
This research introduces a novel routing mechanism for diffusion language models that significantly improves training efficiency and performance by adaptively allocating computational resources based on denoising steps, with available code for implementation.
LLM Training Apr 2 Pending High viability
Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning Ignore
POCO enhances generative policy learning by maintaining stability and efficiency in robotic manipulation through posterior optimization.
AI in Robotics Apr 2 Code
Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation Watch
A video frame interpolation model that uses bidirectional cycle consistency to ensure temporal accuracy and reversibility, outperforming existing methods without added inference cost.
Video Generation Apr 2
AI in Insurance: Adaptive Questionnaires for Improved Risk Profiling Watch
An adaptive questionnaire framework using LLMs and alternative data to personalize insurance underwriting, improve user experience, and reduce fraud.
AI in Insurance Apr 2
RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics Watch
Automate the diagnosis of rubric quality issues in LLM benchmarks to improve evaluation reliability.
LLM Evaluation Apr 1 Code
Open-loop POMDP Simplification and Safe Skipping of Replanning with Formal Performance Guarantees Watch
A new framework for adaptive open-loop planning in POMDPs with formal performance guarantees, enabling faster and safer decision-making under uncertainty.
Robotics Planning Apr 1 Code
A soft and lightweight fabric-based pneumatic interface for multimodal fingertip tactile feedback Watch
Develops a lightweight, fabric-based pneumatic haptic interface for realistic tactile feedback in VR/AR and teleoperation.
Haptic Feedback Interfaces Apr 1 Code
Semantically Annotated Multimodal Dataset for RF Interpretation and Prediction Watch
A new multimodal dataset bridging RF signals with visual and lidar data to enable AI-driven wireless system design and RF-based perception.
RF Interpretation and Prediction Apr 1 Code
Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars Watch
A simplified approach to 3D avatar reconstruction by optimizing the body model rather than scaling network complexity, achieving state-of-the-art results with less computational overhead.
3D Avatar Reconstruction Apr 1 Code
Leveraging the Value of Information in POMDP Planning Watch
A novel planning algorithm for partially observable environments that intelligently filters information to improve decision-making efficiency.
Reinforcement Learning Apr 1 Code
Nonlinear Methods for Analyzing Pose in Behavioral Research Watch
A general-purpose pipeline for analyzing complex human pose data to extract meaningful behavioral insights.
Behavioral AI Apr 1 Code
Improving Latent Generalization Using Test-time Compute Watch
Train language models to use test-time compute for improved latent generalization and deductive reasoning.
LLM Reasoning Apr 1 Code
ReFormeR: Learning and Applying Explicit Query Reformulation Patterns Watch
A pattern-guided approach to query reformulation that elicits and applies explicit patterns to constrain LLM-based query generation for improved retrieval.
Information Retrieval Apr 1
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks Watch
A benchmark to evaluate and mitigate novel privacy risks in human-centered AI agent social networks.
Agents Apr 1 Code
Read More, Think More: Revisiting Observation Reduction for Web Agents Watch
This research optimizes web agent performance by adaptively selecting observation representations based on model capability and token budget, and incorporating historical context.
Web Agents Apr 2 Code
RAE-AR: Taming Autoregressive Models with Representation Autoencoders Watch
A novel method to integrate powerful pre-trained visual encoders into autoregressive generative models, improving performance and unifying understanding and generation.
Generative Models Apr 2 Code
Variational LSTM with Augmented Inputs: Nonlinear Response History Metamodeling with Aleatoric and Epistemic Uncertainty Watch
A variational LSTM with augmented inputs and Monte Carlo dropout to quantify both aleatoric and epistemic uncertainty in high-dimensional nonlinear dynamic structural systems, reducing computational burden.
Uncertainty Quantification in Structural Dynamics Apr 2 Code
Analysis of LLM Performance on AWS Bedrock: Receipt-item Categorisation Case Study Watch
A cost-aware evaluation framework for selecting the optimal LLM for receipt-item categorization on AWS Bedrock.
LLM Application Apr 2 Code
M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis Watch
A novel dynamic fusion strategy for multi-modal brain network analysis that adaptively processes samples for improved performance.
Medical AI Apr 2 Code
EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification Watch
A framework for LLM agents to autonomously generate complex, multi-file skills for professional tasks, improving performance and reducing manual effort.
LLM Agents Apr 2
Scale over Preference: The Impact of AI-Generated Content on Online Content Ecology Watch
This research analyzes the impact of AI-generated content on online platforms, revealing a scale-over-preference dynamic and advocating for AIGC-sensitive distribution algorithms.
AI Content Analysis Apr 2 Code
MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning Watch
A parameter-efficient fine-tuning method that adapts underutilized model subspaces for more efficient knowledge acquisition in LLMs.
LLM Fine-tuning Apr 2
OpenGo: An OpenClaw-Based Robotic Dog with Real-Time Skill Switching Watch
A robotic dog that can dynamically switch between skills in real-time, controlled by natural language instructions.
Robotics Apr 2
Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring Watch
A transformer-based digital twin system for wind-induced structural response forecasting and early warning of structural changes in bridges.
Structural Health Monitoring Apr 2
Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text Watch
A BERT-based system for nuanced toxic language detection in Bulgarian that preserves access to essential information.
Content Moderation AI Apr 2 Code
FSKD: Monocular Forest Structure Inference via LiDAR-to-RGBI Knowledge Distillation Watch
A knowledge distillation framework that uses RGBI imagery to infer high-resolution forest structure data, making detailed ecosystem monitoring more accessible and scalable.
Remote Sensing AI Apr 2
Ranking-Guided Semi-Supervised Domain Adaptation for Severity Classification Watch
A novel semi-supervised domain adaptation method for medical image severity classification that aligns class-specific rank score distributions.
Medical AI Apr 2 Code
Beyond Detection: Ethical Foundations for Automated Dyslexic Error Attribution Watch
An AI system that accurately attributes spelling errors to dyslexic writers, with a strong focus on ethical deployment guidelines for educational contexts.
Educational AI Apr 2 Code
Combining Boundary Supervision and Segment-Level Regularization for Fine-Grained Action Segmentation Watch
A lightweight dual-loss training framework that significantly improves fine-grained action segmentation quality with minimal architectural changes, making complex models more practical.
Action Segmentation Apr 2 Code
A Self supervised learning framework for imbalanced medical imaging datasets Watch
A self-supervised learning framework that improves medical image classification accuracy on imbalanced and scarce datasets.
Medical AI Apr 2 Code
Optimizing Interventions for Agent-Based Infectious Disease Simulations Watch
An AI system that optimizes non-pharmaceutical interventions for infectious disease simulations to minimize societal disruption.
Agent-Based Simulation Optimization Apr 2 Code
Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under Corruptions Watch
A diffusion-based knowledge distillation framework to improve collaborative perception in autonomous systems by actively recovering from sensor and communication corruptions.
Collaborative Perception Apr 2
Network Structure in UK Payment Flows: Evidence on Economic Interdependencies and Implications for Real-Time Measurement Watch
Leverage payment network analysis to provide leading indicators of economic change and improve real-time forecasting accuracy, especially during disruptions.
Economic Forecasting Apr 2 Code
AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression Watch
A framework for compressing large language models without retraining by accounting for input distribution shifts and refining transformer blocks end-to-end.
LLM Compression Apr 2
PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments Watch
A risk-bounded planning algorithm for coordinated UAV-UGV teams that uses a UGV as a mobile charging station in stochastic environments.
Robotics Planning Apr 2 Code
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems Watch
A two-stage framework predicts GPU resource utilization and power consumption in HPC systems for efficient scheduling and power management.
HPC Resource Management Apr 2 Code
ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline Watch
An interactive web-based system for visualizing and understanding the end-to-end inference pipeline of Vision Transformers.
AI Explainability Apr 2
Impact of Multimodal and Conversational AI on Learning Outcomes and Experience Watch
A conversational AI system that integrates text and images to improve learning outcomes in STEM education.
Educational AI Apr 2
SCALE: Semantic- and Confidence-Aware Conditional Variational Autoencoder for Zero-shot Skeleton-based Action Recognition Watch
A novel framework for zero-shot skeleton-based action recognition that leverages conditional variational autoencoders and a confidence-aware energy loss to improve accuracy without explicit skeleton-text alignment.
Computer Vision Apr 2 Code
Do Emotions in Prompts Matter? Effects of Emotional Framing on Large Language Models Watch
An adaptive prompting framework that leverages emotional tone to subtly improve LLM performance on specific tasks.
LLM Prompting Apr 2 Code
(PAC-)Learning state machines from data streams: A generic strategy and an improved heuristic (Extended version) Watch
A generic strategy and improved heuristic for learning state machines from streaming data, implemented in an open-source library and demonstrating effectiveness in runtime, memory, and quality.
State Machine Learning Apr 2 Code
Bias Inheritance in Neural-Symbolic Discovery of Constitutive Closures Under Function-Class Mismatch Ignore
A neural-symbolic framework to robustly discover physical laws from data, addressing bias inheritance in scientific modeling.
Scientific Discovery Apr 1 Code
Benchmark Problems and Benchmark Datasets for the evaluation of Machine and Deep Learning methods on Photoplethysmography signals: the D4 report from the QUMPHY project Ignore
A curated set of benchmark problems and datasets for evaluating machine learning on photoplethysmography signals to quantify uncertainty in medical applications.
Medical AI Apr 1 Code
EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild Ignore
A new benchmark for evaluating binary function similarity detection models to uncover critical generalization gaps in software security.
Software Security Apr 2 Pending
Boosting Vision-Language-Action Finetuning with Feasible Action Neighborhood Prior Ignore
A novel regularization technique for vision-language-action models to improve sample efficiency and generalization in robotic manipulation by exploiting the inherent neighborhood of feasible actions.
Robotics AI Apr 2
Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error Ignore
A novel reinforcement learning algorithm that robustly handles noisy temporal difference errors for more stable and efficient learning.
Reinforcement Learning Apr 2 Code
AI-Assisted Hardware Security Verification: A Survey and AI Accelerator Case Study Ignore
Leveraging AI and LLMs to automate and accelerate hardware security verification processes for complex systems.
Hardware Security Verification Apr 2 Code
ModTrans: Translating Real-world Models for Distributed Training Simulator Ignore
ModTrans enables the use of real-world machine learning models within distributed training simulators, bridging the gap between ML development and system research.
ML Systems Apr 2
Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering Ignore
This paper proposes guidelines and a proof-of-concept for reproducible and explainable evaluations of AI agents in software engineering by making agent trajectories publicly accessible.
Agents Apr 1 Code
When AI Gets it Wong: Reliability and Risk in AI-Assisted Medication Decision Systems Ignore
This research analyzes the reliability and failure modes of AI medication decision systems to mitigate risks and improve patient safety in healthcare.
Medical AI Apr 1 Code
Are Finer Citations Always Better? Rethinking Granularity for Attributed Generation Ignore
This research optimizes citation granularity in attributed generation models to improve attribution quality and answer correctness, finding that intermediate granularities perform best.
Attributed Generation Apr 1 Code
Soft MPCritic: Amortized Model Predictive Value Iteration Ignore
A framework combining reinforcement learning and model predictive control for practical and scalable policy synthesis in complex control tasks.
Reinforcement Learning for Control Apr 1
LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation Ignore
A platform that uses LLM agents to automate social science research by simulating human behavior and generating reports.
Agents Apr 2
Satellite-Free Training for Drone-View Geo-Localization Ignore
A framework for training drone geo-localization models without relying on satellite imagery, enabling deployment in GPS-denied or data-restricted environments.
Geo-localization Apr 2
Non-Rigid 3D Shape Correspondences: From Foundations to Open Challenges and Opportunities Ignore
A survey of methods for estimating correspondences between deformed 3D shapes, highlighting recent advances and future research directions.
3D Shape Analysis Apr 1 Code
Random Coordinate Descent on the Wasserstein Space of Probability Measures Ignore
A novel randomized coordinate descent framework for efficient optimization over probability measures, offering significant speedups over traditional methods for machine learning and mean-field modeling.
Optimization Algorithms Apr 2 Code
Know Your Streams: On the Conceptualization, Characterization, and Generation of Intentional Event Streams Ignore
A prototype generator for creating realistic event streams to benchmark and improve streaming process mining algorithms.
Data Streaming Apr 1 Code
No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents Ignore
This research identifies and quantifies a critical security flaw in shared-state LLM agents where benign user interactions can unintentionally corrupt outcomes for other users, requiring new artifact-level defenses.
LLM Agents Apr 1
Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones Ignore
This research investigates the perceptual differences in voice cloning for standard versus accented Chinese speech, revealing that accent significantly impacts perceived identity and intelligibility.
Voice Cloning Apr 2
Residuals-based Offline Reinforcement Learning Ignore
A new offline reinforcement learning framework that explicitly accounts for estimation error in transition dynamics to improve policy optimization.
Offline Reinforcement Learning Apr 1 Code
Identifying and Estimating Causal Direct Effects Under Unmeasured Confounding Ignore
A statistical method to identify causal direct effects in the presence of unmeasured confounding, applicable to vaccine studies.
Causal Inference Apr 2 Code
JetPrism: diagnosing convergence for generative simulation and inverse problems in nuclear physics Ignore
A diagnostic framework for generative simulation and inverse problems that ensures precise statistical agreement with ground-truth data, applicable across various scientific and financial domains.
Generative Simulation Apr 1 Code
Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming Ignore
A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.
Reasoning Enhancement Apr 1
Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training Ignore
This research compares gradient-free Evolution Strategies with gradient-based RL for LLM fine-tuning, revealing distinct solution geometries that impact knowledge preservation.
LLM Post-Training Apr 2 Pending
An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis Ignore
An ML-accelerated framework to optimize energy system design by bridging architectural and operational performance gaps, reducing high-fidelity model evaluations.
Energy System Optimization Apr 1 Code
What Do Claim Verification Datasets Actually Test? A Reasoning Trace Analysis Ignore
This research analyzes claim verification datasets to reveal their limitations in testing complex reasoning, suggesting improvements for more robust AI evaluation.
AI Evaluation Apr 2 Code
True to Tone? Quantifying Skin Tone Fidelity and Bias in Photographic-to-Virtual Human Pipelines Ignore
A scalable methodology to evaluate and improve skin tone accuracy and fairness in virtual human rendering pipelines.
Virtual Humans Apr 2
Coupled Query-Key Dynamics for Attention Ignore
A novel attention mechanism that improves language model training stability and efficiency by jointly evolving queries and keys.
LLM Optimization Apr 2
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models Ignore
A framework for generating 3D adversarial textures to test and improve the robustness of vision-language-action models in robotic manipulation.
Robotics AI Apr 2
DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 Ignore
A novel inference parallelization strategy for LLMs that improves performance by offloading MoE weights and enabling independent GPU execution.
LLM Inference Optimization Apr 2
Analysis of Efficient Transmission Methods of Grid Maps for Intelligent Vehicles Ignore
Develops a communication pipeline for efficient transmission of grid-map data for intelligent vehicles.
Intelligent Vehicle Perception Apr 2
Integrated Identification of Collaborative Robots for Robot Assisted 3D Printing Processes Ignore
A model-based approach to improve precision and control in robot-assisted 3D printing by identifying system parameters.
Robotics for Additive Manufacturing Apr 2
AstroConcepts: A Large-Scale Multi-Label Classification Corpus for Astrophysics Ignore
A new corpus and evaluation framework for tackling extreme class imbalance in scientific text classification, enabling more robust NLP models for specialized domains.
Scientific NLP Apr 2 Code
Resonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulation Ignore
A framework for physics-driven 4D dynamic simulation that uses dual-domain motion supervision to improve physical fidelity and reduce computational cost.
4D Simulation Apr 2
Bayesian Elicitation with LLMs: Model Size Helps, Extra "Reasoning" Doesn't Always Ignore
This research investigates the accuracy and uncertainty estimation of LLMs for Bayesian elicitation, finding that model size is key but reasoning effort is not, and that statistical correction is needed for reliable decision-making.
LLM Uncertainty Quantification Apr 2
Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine Ignore
Develops a Koopman operator-based adaptive control system for turbofan engines to improve robustness and thrust response.
Aerospace Control Apr 2 Code
Model-Based Reinforcement Learning for Control under Time-Varying Dynamics Ignore
An adaptive reinforcement learning algorithm that maintains control performance in systems with changing dynamics by intelligently managing historical data.
Model-Based RL Apr 2 Code
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance Ignore
This research reveals a fundamental geometric reasoning gap in current Vision-Language Models, highlighting a need for improved spatial invariance in future multimodal systems.
Vision-Language Models Apr 2 Code
Solving the Two-dimensional single stock size Cuting Stock Problem with SAT and MaxSAT Ignore
A SAT-based framework for optimizing the 2D cutting stock problem, outperforming existing solvers on benchmark instances.
Operations Research / Optimization Apr 2 Code
Architectural Implications of the UK Cyber Security and Resilience Bill Ignore
A framework for achieving compliance with the UK Cyber Security and Resilience Bill using Zero Trust Architecture.
Cybersecurity Architecture Apr 2 Code
When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning Ignore
An RL agent that uses smaller language models to suggest actions only when it's uncertain, improving out-of-distribution performance without retraining.
Reinforcement Learning Apr 2 Code
From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems Ignore
A structured engineering method to verify Operational Design Domain coverage for safety-critical AI systems, meeting aviation certification standards.
AI Safety & Certification Apr 2 Code
A Graph Neural Network Approach for Solving the Ranked Assignment Problem in Multi-Object Tracking Ignore
A Graph Neural Network approach to improve data association in multi-object tracking for autonomous vehicles.
Multi-Object Tracking Apr 2
Auction-Based Online Policy Adaptation for Evolving Objectives Ignore
An auction-based framework for reinforcement learning agents to dynamically adapt to changing objectives by bidding for action execution.
Multi-Objective Reinforcement Learning Apr 2
Probabilistic classification from possibilistic data: computing Kullback-Leibler projection with a possibility distribution Ignore
A novel method for training multi-class classifiers using graded plausibility supervision, improving predictive performance by projecting predictions onto admissible probability distributions.
Machine Learning Theory Apr 2 Code
The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management Ignore
An agentic AI pipeline automates strategic asset allocation by having specialized agents generate, critique, and improve portfolio construction methods, guided by an investment policy statement.
AI Agents for Finance Apr 2
Physics-Informed Transformer for Multi-Band Channel Frequency Response Reconstruction Ignore
A physics-informed Transformer reconstructs wireless channel frequency responses from fragmented spectrum data, outperforming classical methods in accuracy.
Wireless Communication AI Apr 2
COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing Ignore
A framework for robust multimodal sensing that synthesizes proxy tokens for missing modalities to ensure complete fusion.
Multimodal AI Apr 2
Neural network methods for two-dimensional finite-source reflector design Ignore
Leveraging neural networks for faster and more accurate design of optical reflectors to precisely shape light.
Optical Design AI Apr 2 Code
A virtual-variable-length method for robust inverse kinematics of multi-segment continuum robots Ignore
A novel method for robustly solving inverse kinematics in multi-segment continuum robots, improving convergence rates and reducing iteration counts.
Robotics Apr 2 Code
Reliable News or Propagandist News? A Neurosymbolic Model Using Genre, Topic, and Persuasion Techniques to Improve Robustness in Classification Ignore
A neurosymbolic model enhances news classification robustness by combining text embeddings with symbolic features like genre, topic, and persuasion techniques.
NLP Classification Apr 2 Code
Learning in Prophet Inequalities with Noisy Observations Ignore
Develops algorithms for online decision-making with noisy observations by integrating learning and decision-making.
Online Decision Making Apr 2 Code
Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges Ignore
Develops multi-agent architectures for video recommendation systems that leverage foundation models and LLMs for more precise and explainable content delivery.
Multi-Agent Recommender Systems Apr 2 Code
Why Gaussian Diffusion Models Fail on Discrete Data? Ignore
This research identifies and proposes solutions for fundamental limitations of diffusion models in generating discrete data, with validated improvements across text, code, and protein generation.
Generative Models Apr 2 Code
PAC-Bayesian Reward-Certified Outcome Weighted Learning Ignore
A novel PAC-Bayesian framework for robustly estimating individualized treatment rules by accounting for reward uncertainty, leading to improved treatment regime selection.
Reinforcement Learning Apr 2 Code
ROS 2-Based LiDAR Perception Framework for Mobile Robots in Dynamic Production Environments, Utilizing Synthetic Data Generation, Transformation-Equivariant 3D Detection and Multi-Object Tracking Ignore
A LiDAR perception framework for mobile robots in dynamic production environments that improves 6D pose estimation and multi-object tracking using synthetic data and a novel detection method.
Robotics Perception Apr 2
Demographic Parity Tails for Regression Ignore
A new framework for regression fairness that targets specific distribution tails, offering more nuanced and context-sensitive interventions.
Fairness in ML Apr 2 Code
BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy Ignore
A Bayesian federated learning framework for private multimodal survival prediction with uncertainty estimates.
Privacy-Preserving AI Apr 2
Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization Ignore
A hybrid framework combining predictive and heuristic methods to optimize cloud infrastructure costs and response times.
Cloud Cost Optimization Apr 2
Country-wide, high-resolution monitoring of forest browning with Sentinel-2 Ignore
A scalable approach for country-wide mapping of forest greenness anomalies using satellite data to detect disturbances.
Environmental Monitoring Apr 2
Prosodic ABX: A Language-Agnostic Method for Measuring Prosodic Contrast in Speech Representations Ignore
A language-agnostic method to measure prosodic contrast in speech representations using a novel ABX task and a released dataset.
Speech AI Apr 2 Code
Bridging Discrete Planning and Continuous Execution for Redundant Robot Ignore
A framework to improve the continuous execution quality of robot path planning by bridging discrete planning and inverse kinematics.
Robotics Apr 2
Can Heterogeneous Language Models Be Fused? Ignore
A method to merge diverse language models from different architectures into a single, more capable model.
LLM Fusion Apr 2
CXR-LT 2026 Challenge: Projection-Aware Multi-Label and Zero-Shot Chest X-Ray Classification Ignore
A novel dual-branch architecture for chest X-ray classification that improves multi-label and zero-shot performance by integrating projection-specific models and contrastive learning.
Medical AI Apr 2
Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images Ignore
A novel autoencoder learns position-independent image representations for faster, direct image interpolation and synthesis.
Generative Image Models Apr 2
Is Clinical Text Enough? A Multimodal Study on Mortality Prediction in Heart Failure Patients Ignore
Develops a multimodal transformer model for improved short-term mortality prediction in heart failure patients by combining clinical text and structured EHR data.
Medical AI Apr 2
Macroscopic transport patterns of UAV traffic in 3D anisotropic wind fields: A constraint-preserving hybrid PINN-FVM approach Ignore
A hybrid physics-informed neural network and finite-volume method for simulating UAV traffic in complex wind fields.
UAV Traffic Simulation Apr 1
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial Ignore
A tutorial on Bayesian Optimization for accelerating scientific discovery by automating the hypothesize-experiment-refine cycle.
Scientific Discovery Automation Apr 1
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning Ignore
A theoretical framework for estimating counterfactual gradients in adaptive inverse reinforcement learning using Malliavin calculus.
Reinforcement Learning Apr 1
Safety, Security, and Cognitive Risks in World Models Ignore
This paper identifies and formalizes safety, security, and cognitive risks in world models, proposing mitigations for critical AI deployments.
AI Safety & Security Apr 1
Semantic Modeling for World-Centered Architectures Ignore
A new framework for multi-agent systems that uses a shared world representation for improved consistency and stability.
Agent Architectures Apr 1
Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market Tasks Ignore
This research analyzes the continuous and broad-based nature of AI automation in labor market tasks, contrasting with abrupt 'crashing waves' of capability surges.
LLM Evaluation Apr 1
"The System Will Choose Security Over Humanity Every Time": Understanding Security and Privacy for U.S. Incarcerated Users Ignore
This research identifies critical privacy and security vulnerabilities in digital devices used by incarcerated individuals, highlighting the need for user-centric design and policy reform.
Privacy and Security in Incarcerated Settings Apr 1
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models Ignore
This research investigates internal mechanisms within language models for entity-centric factual questions, identifying specific neurons responsible for entity recall.
LLM Interpretability Apr 1
Test-Time Scaling Makes Overtraining Compute-Optimal Ignore
Develops new scaling laws for LLM pretraining that optimize for end-to-end compute budgets, including inference costs, leading to overtrained models with improved performance.
LLM Training Apr 1
The power of context: Random Forest classification of near synonyms. A case study in Modern Hindi Ignore
This research quantitatively demonstrates that word usage patterns in Hindi preserve etymological signals, distinguishing between Sanskrit and Perso-Arabic origins of synonyms.
NLP Linguistics Apr 1
Preserving Target Distributions With Differentially Private Count Mechanisms Ignore
A novel framework for differentially private count mechanisms that balances distribution accuracy with count accuracy and runtime.
Differential Privacy Apr 1
When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals Ignore
A method to mitigate reward hacking in LLMs by penalizing shortcut behaviors during training.
LLM Training Apr 1
Distal-Stable Beam for Continuum Robots Ignore
A novel geometric design for continuum robots that significantly improves distal stiffness for enhanced precision in constrained environments.
Robotics Apr 1
Non-monotonicity in Conformal Risk Control Ignore
Develops theoretical guarantees for conformal risk control under non-monotonic loss functions, improving stability in practical applications.
Risk Control Apr 2
Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once Ignore
A new framework for understanding and evaluating LLM output diversity across different task objectives, revealing trade-offs between safety, representation, and creativity.
LLM Evaluation Apr 2 Code
Why Instruction-Based Unlearning Fails in Diffusion Models? Ignore
This research reveals a fundamental limitation in unlearning concepts from diffusion models using only natural language instructions, suggesting a need for new intervention methods.
Diffusion Model Control Apr 2
Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training Ignore
This research investigates the detrimental interaction between specific normalization layers and optimizers in LLM training, revealing failure modes that can be mitigated through parameter adjustments or EMA blending.
LLM Training Apr 2
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression Ignore
Enhances active learning for regression by weighting features to improve sample selection accuracy.
Active Learning Apr 2
MTI: A Behavior-Based Temperament Profiling System for AI Agents Ignore
A novel behavior-based system to profile AI agent temperaments across four key axes, independent of model capability.
AI Agent Profiling Apr 2
Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors Ignore
A data-driven framework using a recurrent neural network for reconstructing magnetohydrodynamic flows in fusion reactor blankets from sparse measurements.
Fusion Reactor AI Apr 2
Efficient Constraint Generation for Stochastic Shortest Path Problems Ignore
A new algorithm for stochastic shortest path problems that significantly reduces computation by intelligently pruning actions based on heuristics, leading to faster problem solving.
Optimization Algorithms Apr 2
Grounding AI-in-Education Development in Teachers' Voices: Findings from a National Survey in Indonesia Ignore
This research identifies teacher needs and AI usage patterns in Indonesian education to inform the development of context-appropriate AI tools and policies.
AI in Education Apr 2
BraiNCA: brain-inspired neural cellular automata and applications to morphogenesis and motor control Ignore
A brain-inspired neural cellular automata with attention for improved robustness and learning speed in self-organization tasks.
Generative Models Apr 2
How and why does deep ensemble coupled with transfer learning increase performance in bipolar disorder and schizophrenia classification? Ignore
Investigating the theoretical underpinnings of how transfer learning and deep ensembles improve psychiatric disorder classification.
Medical AI Apr 2
Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study Ignore
This study evaluates LLM-generated translations of radiology reports, finding significant discrepancies between LLM and human expert evaluations, highlighting the continued need for human oversight in medical contexts.
Medical AI Apr 2
3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements Ignore
A novel framework for 3D relative robot localization using angle and self-displacement measurements, addressing noise and optimization challenges.
Robotics Localization Apr 2
Systematic Analyses of Reinforcement Learning Controllers in Signalized Urban Corridors Ignore
This paper explores the theoretical capacity regions of different reinforcement learning controllers for urban traffic networks, comparing them to a classical baseline.
Reinforcement Learning for Traffic Control Apr 2
go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices Ignore
A new mathematical framework for parameterizing layer connectivity in large language models that scales efficiently and improves expressivity.
LLM Training Apr 2
Learning Spatial Structure from Pre-Beamforming Per-Antenna Range-Doppler Radar Data via Visibility-Aware Cross-Modal Supervision Ignore
Learning spatial structure directly from raw radar data for automotive perception.
Automotive Perception Apr 2
How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization Ignore
A theoretical framework to measure the optimality of word and gesture order based on swap distance minimization.
Linguistic Analysis Apr 2
Learn by Surprise, Commit by Proof Ignore
A self-gated post-training framework for autonomous knowledge acquisition in language models, learning only what the model doesn't know and sharpening existing knowledge to reduce hallucinations.
LLM Training Apr 2
Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation Ignore
A novel evaluation protocol for facial action unit detection that quantifies noise and improves robustness, revealing that many reported gains may be artifacts of the testing method.
AI Evaluation & Benchmarking Apr 2 Code
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks Ignore
Adapting large language models for vision tasks through a novel bridge training method.
Vision-Language Models Apr 2
On the Role of Depth in the Expressivity of RNNs Ignore
This paper theoretically analyzes the benefits of depth in Recurrent Neural Networks, showing it enhances memory capacity and expressivity.
RNN Theory Apr 2
The AnIML Ontology: Enabling Semantic Interoperability for Large-Scale Experimental Data in Interconnected Scientific Labs Ignore
Develops an ontology to improve semantic interoperability for scientific experimental data, addressing inconsistencies in existing standards.
Scientific Data Interoperability Apr 2
A Novel Theoretical Analysis for Clustering Heteroscedastic Gaussian Data without Knowledge of the Number of Clusters Ignore
A theoretical framework for clustering heteroscedastic Gaussian data with a novel cost function and Wald kernel, leading to a new algorithm called CENTRE-X.
Clustering Algorithms Apr 2
Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling Ignore
A novel recurrent latent modeling approach for improved long-horizon sequential data representation and generalization.
Sequential Modeling Apr 2
Domain-constrained knowledge representation: A modal framework Ignore
A new framework for knowledge graphs that treats domain as a core part of representation, enabling disambiguation and cross-domain relations.
Knowledge Representation Apr 2
Dual-Attention Based 3D Channel Estimation Ignore
A deep learning approach for more accurate 3D channel estimation in MIMO systems.
Wireless Communication AI Apr 2
Do Large Language Models Mentalize When They Teach? Ignore
This research investigates whether Large Language Models exhibit mentalizing behavior when teaching, using cognitive models to analyze their decision-making processes in a simulated learning environment.
LLM Teaching & Cognition Apr 2
DDCL: Deep Dual Competitive Learning: A Differentiable End-to-End Framework for Unsupervised Prototype-Based Representation Learning Ignore
A novel differentiable framework for unsupervised prototype-based representation learning that unifies feature extraction and cluster assignment.
Unsupervised Representation Learning Apr 2
Ontology-Aware Design Patterns for Clinical AI Systems: Translating Reification Theory into Software Architecture Ignore
Develop resilient clinical AI systems by applying ontology-aware design patterns to mitigate data distortion and semantic drift.
Clinical AI Architecture Apr 2 Code
Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling Ignore
A novel training strategy for LLMs that balances in-context and in-weights learning by using contrastive context sampling to improve performance and prevent label copying.
LLM Training Apr 2
The Digital Twin Counterfactual Framework: A Validation Architecture for Simulated Potential Outcomes Ignore
A framework for validating simulated counterfactual outcomes in causal inference by introducing a digital twin and a hierarchical validation architecture.
Causal Inference Apr 1
Perceptual misalignment of texture representations in convolutional neural networks Ignore
This research investigates the disconnect between how convolutional neural networks represent textures and human perception, suggesting current object recognition models are insufficient for understanding texture perception.
Computer Vision Research Apr 1
Assessing Pause Thresholds for empirical Translation Process Research Ignore
This paper proposes a novel method for analyzing typing pauses in translation to understand cognitive processes, but lacks immediate commercial application.
Translation Process Research Apr 1
A Dynamic Atlas of Persian Poetic Symbolism: Families, Fields, and the Historical Rewiring of Meaning Ignore
This research maps the historical evolution of symbolic meaning in Persian poetry, identifying recurring themes and their changing relationships over centuries.
Digital Humanities / NLP Apr 1
Topology-Hiding Connectivity-Assurance for QKD Inter-Networking Ignore
A protocol for proving secure connections in quantum key distribution networks without revealing network topology.
Quantum Cryptography Apr 2
DDCL-INCRT: A Self-Organising Transformer with Hierarchical Prototype Structure (Theoretical Foundations) Ignore
A theoretical framework for self-organizing transformer architectures that minimize computational resources by dynamically determining their structure during training.
LLM Architecture Apr 2
Qiana: A First-Order Formalism to Quantify over Contexts and Formulas with Temporality Ignore
A formal logic system for reasoning about context-dependent and temporal information.
Logic Frameworks Apr 2
Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks Ignore
Develops theoretical generalization bounds for multiple operator learning architectures, specifically MNO networks, to understand sample complexity for unseen operator instances.
Operator Learning Apr 2
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Ignore
This paper surveys the concept of latent space in language models, exploring its foundations, evolution, mechanisms, abilities, and future outlook.
LLM Research Apr 2
Tracking the emergence of linguistic structure in self-supervised models learning from speech Ignore
This research investigates the emergence of linguistic structure within self-supervised speech models, analyzing layerwise patterns and learning trajectories.
Speech Representation Learning Apr 2
Generative AI Spotlights the Human Core of Data Science: Implications for Education Ignore
This paper argues that generative AI should shift data science education to focus on uniquely human reasoning skills, rather than automating them.
AI Education Apr 2
Best-Arm Identification with Noisy Actuation Ignore
Develops theoretical communication schemes for multi-armed bandit problems with noisy actuation, linking to channel capacity.
Reinforcement Learning Apr 2