OmniMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory Build Now
OmniMem is an autonomous multimodal memory system enhancing AI agents' lifelong memory with a 23-stage autoresearch pipeline.
AI Memory Systems Apr 1 Pending High viability
Learning Humanoid Navigation from Human Data Build Now
EgoNav enables humanoid robots to autonomously navigate diverse environments using human walking data, bypassing traditional robot-specific data collection.
Humanoid Robotics Apr 1 Code High viability
YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction Build Now
YieldSAT provides high-resolution crop yield predictions using a multimodal dataset to improve agricultural productivity efficiently.
Agriculture Technology Apr 1 Code High viability
Risk-Aware Batch Testing for Performance Regression Detection Build Now
Building a CI tool to save over $490K annually in infrastructure costs by optimizing performance regression testing with risk-aware batch strategies.
Risk Management & CI Optimization Mar 31 Code High viability
Internal APIs Are All You Need: Shadow APIs, Shared Discovery, and the Case Against Browser-First Agent Architectures Build Now
Unbrowse transforms web interaction for agents by converting redundant browser discoveries into a shared API index, vastly improving speed and efficiency.
AI Middleware & Tools Apr 1 Code High viability
BloClaw: An Omniscient, Multi-Modal Agentic Workspace for Next-Generation Scientific Discovery Build Now
BloClaw is a robust AI operating system for scientific discovery, enhancing data visualization and computational workflows with state-driven interfaces.
AI4S Operating Systems Apr 1 Pending High viability
HippoCamp: Benchmarking Contextual Agents on Personal Computers Watch
HippoCamp is a benchmark evaluating digital assistants' capabilities in managing personal file systems for enhanced user-specific reasoning.
Personal Computing Agents Apr 1 Code
In harmony with gpt-oss Watch
Build robust coding tool harnesses utilizing GPT-OSS for improved integrated AI development environments.
AI/ML Model Tools Apr 1 Pending
CliffSearch: Structured Agentic Co-Evolution over Theory and Code for Scientific Algorithm Discovery Watch
CliffSearch offers an AI-driven, evolutionary framework that enhances scientific algorithm discovery by ensuring correctness and originality through agent-based mutation and review processes.
Scientific Algorithm Discovery Apr 1 Code
LLM REgression with a Latent Iterative State Head Watch
RELISH enhances LLM-based regression by iteratively refining latent states for precise numerical predictions.
Machine Learning Regression Apr 1 Code
SYNTHONY: A Stress-Aware, Intent-Conditioned Agent for Deep Tabular Generative Models Selection Build Now
SYNTHONY selects optimal deep tabular generative models based on dataset stress profiles and user intent, outperforming LLM selectors.
Tabular Data Synthesis Mar 31 Code High viability
Excite, Attend and Segment (EASe): Domain-Agnostic Fine-Grained Mask Discovery with Feature Calibration and Self-Supervised Upsampling Build Now
An unsupervised, domain-agnostic framework for fine-grained semantic segmentation that discovers masks in complex scenes using feature calibration and self-supervised upsampling.
Unsupervised Segmentation Mar 31 Code High viability
LLM Essay Scoring Under Holistic and Analytic Rubrics: Prompt Effects and Bias Build Now
An LLM-based essay scoring system that identifies and corrects systematic biases in grading, improving accuracy with minimal data.
LLM Applications Mar 31 Code High viability
VADMamba++: Efficient Video Anomaly Detection via Hybrid Modeling in Grayscale Space Build Now
An efficient video anomaly detection method using Mamba, CNN, and Transformer modules for grayscale to RGB reconstruction.
Video Anomaly Detection Apr 1 Code High viability
Robust Multimodal Safety via Conditional Decoding Build Now
A conditional decoding strategy that significantly improves multimodal LLM safety by predicting a binary safety token before response generation, without external classifiers or modality-specific fine-tuning.
Multimodal LLM Safety Mar 31 Code High viability
Unsupervised 4D Flow MRI Velocity Enhancement and Unwrapping Using Divergence-Free Neural Networks Build Now
An unsupervised neural network for enhancing and unwrapping velocity fields in 4D Flow MRI data.
Medical AI Mar 31 Code High viability
Execution-Verified Reinforcement Learning for Optimization Modeling Build Now
An AI framework that uses a mathematical solver as a verifier to automate optimization modeling from natural language, reducing the need for costly process supervision and enabling cross-solver generalization.
LLM Optimization Apr 1 Code High viability
A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks Build Now
A framework for investigating how fault localization granularity impacts automatic code repair at the repository scale, offering a proof of concept for optimizing this process.
Code Repair Mar 31 Code High viability
Out of Sight, Out of Track: Adversarial Attacks on Propagation-based Multi-Object Trackers via Query State Manipulation Build Now
A novel attack framework for multi-object tracking systems that exploits query propagation vulnerabilities to cause track terminations and identity switches.
Computer Vision Apr 1 Code High viability
Neural Reconstruction of LiDAR Point Clouds under Jamming Attacks via Full-Waveform Representation and Simultaneous Laser Sensing Build Now
An AI system that reconstructs authentic LiDAR point clouds under jamming attacks by analyzing full-waveform data.
LiDAR Security Apr 1 Code High viability
RawGen: Learning Camera Raw Image Generation Build Now
A diffusion-based framework for generating camera raw images from text or sRGB inputs, enabling scalable synthetic data for downstream vision tasks.
Generative Imaging Mar 31 Code High viability
Unified Architecture Metamodel of Information Systems Developed by Generative AI Build Now
A unified architecture metamodel for LLM-oriented applications to automate and standardize the software development lifecycle.
AI Development Tools Mar 31 Code High viability
Real Time Local Wind Inference for Robust Autonomous Navigation Build Now
Real-time local wind inference using LiDAR and deep learning enables robust autonomous navigation for aerial robots, improving energy efficiency and obstacle avoidance.
Autonomous Navigation Apr 1 Code High viability
Label-efficient underwater species classification with semi-supervised learning on frozen foundation model embeddings Build Now
Label-efficient underwater species classification using semi-supervised learning on frozen foundation model embeddings, requiring no model training.
Computer Vision Mar 31 Code High viability
Dynamic Graph Neural Network with Adaptive Features Selection for RGB-D Based Indoor Scene Recognition Build Now
An AI model that uses dynamic graphs and adaptive feature selection from RGB-D data for more accurate indoor scene recognition.
Indoor Scene Recognition Apr 1 Code High viability
ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving Build Now
An adaptive LLM serving router that enforces budgets, adapts to non-stationary conditions, and allows hot-swapping of models.
LLM Serving Mar 31 Pending High viability
All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models Build Now
Multi-Group Policy Optimization (MUPO) incentivizes divergent thinking in Vision-Language Models to overcome diversity collapse and improve reasoning capabilities.
Vision-Language Models Apr 1 Code High viability
COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Driving Build Now
Context-aware transfer adaptation for trajectory prediction in autonomous driving, significantly reducing prediction error in new geographic domains.
Autonomous Driving Apr 1 Code High viability
UCMNet: Uncertainty-Aware Context Memory Network for Under-Display Camera Image Restoration Build Now
A lightweight AI model that restores image quality for under-display cameras by adaptively processing regions based on estimated uncertainty.
Image Restoration Apr 1 Code High viability
VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications Build Now
An agentic framework that iteratively synthesizes and repairs formal software specifications using LLMs and verification feedback to ensure correctness and completeness.
AI Agents Mar 31 Code High viability
Speeding Up Mixed-Integer Programming Solvers with Sparse Learning for Branching Build Now
Accelerate mixed-integer programming solvers by using interpretable, sparse learning models for branching decisions, outperforming GPU-accelerated methods.
Optimization Mar 31 Code High viability
Lévy-Flow Models: Heavy-Tail-Aware Normalizing Flows for Financial Risk Management Build Now
Introducing Lévy-Flows, a new class of normalizing flow models that capture heavy-tailed financial data for improved risk management and density estimation.
Financial AI Mar 31 Code High viability
GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes Build Now
GUIDE is a reinforcement learning framework that provides behavioral recommendations to complement automated insulin delivery systems for Type 1 Diabetes management.
Healthcare AI Apr 1 Code High viability
Omni-MMSI: Toward Identity-attributed Social Interaction Understanding Build Now
A new task and pipeline for understanding social interactions from raw audio and vision, with a focus on identity attribution and reasoning.
Multimodal AI Mar 31 Code High viability
SAGE: Subsurface AI-driven Geostatistical Extraction with proxy posterior Build Now
A framework for generating statistically consistent subsurface velocity models from sparse well logs and seismic images, enabling data-efficient seismic imaging and inversion.
Geospatial AI Mar 31 Pending High viability
A Taxonomy of Programming Languages for Code Generation Build Now
A reproducible taxonomy of programming languages by resource availability to guide dataset curation and tier-aware LLM evaluation for code generation.
LLM Development Mar 31 Code High viability
Is One Token All It Takes? Graph Pooling Tokens for LLM-based GraphQA Build Now
This research enhances Graph Question Answering by improving how graph structures are encoded into LLMs, achieving competitive results with a novel multi-token pooling approach and LoRA stabilization.
Graph Neural Networks Apr 1 Pending High viability
UCell: rethinking generalizability and scaling of bio-medical vision models Build Now
A parameter-efficient deep learning model for single-cell segmentation that matches larger models' performance and generalizes well to unseen data.
Medical AI Mar 31 Pending High viability
The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment Build Now
A fine-tuning framework that explicitly reduces the modality gap in Vision-Language Models by aligning geometric and distributional structures for improved cross-modal tasks.
Vision-Language Models Mar 31 Code High viability
Automated Detection of Multiple Sclerosis Lesions on 7-tesla MRI Using U-net and Transformer-based Segmentation Build Now
Transformer-based models trained on 7T MRI data for automated detection of multiple sclerosis lesions, outperforming classical methods and releasing code for research.
Medical Imaging AI Apr 1 Pending High viability
QUEST: A robust attention formulation using query-modulated spherical attention Build Now
A novel attention mechanism for Transformers that improves training stability, performance, and robustness to data corruptions and adversarial attacks, with applications in vision and beyond.
LLM Training Mar 31 Code High viability
Epileptic Seizure Detection in Separate Frequency Bands Using Feature Analysis and Graph Convolutional Neural Network (GCN) from Electroencephalogram (EEG) Signals Build Now
A frequency-aware framework for epileptic seizure detection using GCNs on EEG signals, offering improved interpretability and diagnostic precision.
Medical AI Mar 31 Code High viability
VLM-in-the-Loop: A Plug-In Quality Assurance Module for ECG Digitization Pipelines Build Now
A plug-in quality assurance module for ECG digitization that uses VLMs and domain-specific tools for improved accuracy.
Document Digitization AI Apr 1 Code High viability
Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning Build Now
Agent Q-Mix is a reinforcement learning framework that optimizes LLM agent communication topology for improved accuracy and token efficiency in complex tasks.
LLM Agents Apr 1 Code High viability
SANA I2I: A Text Free Flow Matching Framework for Paired Image to Image Translation with a Case Study in Fetal MRI Artifact Reduction Build Now
A text-free image-to-image translation framework for medical imaging, effectively reducing artifacts with competitive performance.
Generative Image Models Mar 31 Code High viability
Neural-Assisted in-Motion Self-Heading Alignment Build Now
A neural-assisted framework for rapid and accurate initial heading estimation in autonomous ocean platforms, significantly reducing alignment time and improving accuracy.
Robotics Navigation Mar 31 Code High viability
EvolveTool-Bench: Evaluating the Quality of LLM-Generated Tool Libraries as Software Artifacts Build Now
EvolveTool-Bench evaluates LLM-generated tool libraries as software artifacts, revealing quality risks invisible to task-only evaluation and enabling better tool development.
LLM Agents Apr 1 Code High viability
Polish phonology and morphology through the lens of distributional semantics Build Now
Leveraging distributional semantics to uncover relationships between Polish word phonology, morphology, and meaning, with potential for language learning tools.
Computational Linguistics Mar 31 Code High viability
Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation Build Now
A novel diversity-aware distillation objective for LLMs that improves performance and the fidelity-diversity trade-off.
LLM Distillation Mar 31 Code High viability
First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models Build Now
A training-free method called First Logit Boosting to mitigate object hallucination in Large Vision-Language Models with negligible inference overhead.
Vision-Language Models Apr 1 Pending High viability
Executing as You Generate: Hiding Execution Latency in LLM Code Generation Build Now
An LLM code generation system that executes code concurrently with generation, reducing end-to-end latency by up to 55%.
LLM Agents Apr 1 Code High viability
Hierarchical Pre-Training of Vision Encoders with Large Language Models Build Now
A hierarchical pre-training framework that enhances vision-language alignment by enabling structured feature fusion between vision encoders and LLMs.
Vision-Language Models Mar 31 Code High viability
RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government Retrieval-Augmented Generation Systems Build Now
RAGShield provides a five-layer defense-in-depth framework for government RAG systems, using supply chain provenance verification to prevent knowledge base poisoning attacks.
AI Security Apr 1 Code High viability
Shapley-Guided Neural Repair Approach via Derivative-Free Optimization Build Now
A Shapley-guided neural repair approach that uses derivative-free optimization to localize and fix defects like backdoors, adversarial attacks, and unfairness in DNNs.
AI Security Apr 1 Code High viability
MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation Build Now
MAC-Attention accelerates LLM long-context decoding by reusing prior attention computations, reducing latency and KV accesses while maintaining fidelity.
LLM Inference Optimization Mar 31 Pending High viability
mmAnomaly: Leveraging Visual Context for Robust Anomaly Detection in the Non-Visual World with mmWave Radar Build Now
mmAnomaly is a multi-modal framework combining mmWave radar and RGBD input for robust anomaly detection in non-visual scenarios, achieving up to 94% F1 score.
Computer Vision Apr 1 Code High viability
A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking Build Now
A dual-stream Transformer architecture for illumination-invariant TIR-LiDAR person tracking in autonomous robots.
Robotics Perception Apr 1 Code High viability
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation Build Now
A memory control framework for LLM agents that mimics human selective forgetting to reduce interference and latency.
Agents Mar 31 Pending High viability
FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval Watch
A modified retrieval model that integrates LLM-derived relevance signals to identify fine-grained evidence cues with minimal latency overhead.
Information Retrieval Mar 31 High viability
Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation Build Now
A novel Mixture-of-Experts management system for LLMs that learns 'molecular memory' for faster domain adaptation, promising significant cost and energy savings.
LLM Training Apr 1 Code High viability
RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection Build Now
RegFormer is a transferable relational grounding module for efficient and accurate weakly-supervised Human-Object Interaction detection that learns spatial cues for instance-level reasoning.
Computer Vision Apr 1 Pending High viability
HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models Build Now
HarassGuard: A privacy-preserving vision-language model system for detecting physical harassment in social VR using only visual input.
Content Moderation Apr 1 Code High viability
AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages Build Now
A dataset and evaluation framework for fact-checking in ten African languages, addressing critical information gaps.
Low-Resource NLP Apr 1 Code High viability
TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning Build Now
Adapt pre-trained video reasoning models to new domains at test-time without labels using reinforcement learning, outperforming state-of-the-art.
Video Reasoning Apr 1 Code High viability
ARGS: Auto-Regressive Gaussian Splatting via Parallel Progressive Next-Scale Prediction Build Now
Auto-Regressive Gaussian Splatting (ARGS) enables parallel, multi-scale 3D object generation with controllable detail and visual fidelity.
3D Object Generation Apr 1 Code High viability
HabitatAgent: An End-to-End Multi-Agent System for Housing Consultation Watch
An LLM-powered multi-agent system for end-to-end housing consultation that significantly outperforms baselines in accuracy and reliability.
Agents Apr 1 High viability
From Baselines to Preferences: A Comparative Study of LoRA/QLoRA and Preference Optimization for Mental Health Text Classification Build Now
A comparative study of LoRA/QLoRA and preference optimization for mental health text classification, providing a practical framework for choosing effective training strategies.
LLM Fine-tuning Apr 1 Code High viability
KG-CMI: Knowledge graph enhanced cross-Mamba interaction for medical visual question answering Build Now
KG-CMI is a knowledge graph enhanced framework for medical visual question answering that improves accuracy and supports free-form answers.
Medical AI Apr 1 Code High viability
How to Train your Tactile Model: Tactile Perception with Multi-fingered Robot Hands Build Now
A Vision Transformer-based tactile perception model that generalizes to new robot hand sensors, reducing data collection and retraining needs for scalable robotic manipulation.
Robotics Perception Apr 1 Code High viability
Routing-Free Mixture-of-Experts Build Now
A novel Mixture-of-Experts model that eliminates centralized routing for improved scalability and robustness in LLMs.
LLM Architecture Apr 1 Pending High viability
Streaming Model Cascades for Semantic SQL Build Now
Develops adaptive cascade algorithms for streaming, per-partition execution of large language models in data warehouses to reduce inference costs.
LLM Optimization Apr 1 Code High viability
TF-SSD: A Strong Pipeline via Synergic Mask Filter for Training-free Co-salient Object Detection Build Now
A training-free co-salient object detection pipeline leveraging SAM and DINO for improved generalization and performance.
Computer Vision Apr 1 Pending High viability
English to Central Kurdish Speech Translation: Corpus Creation, Evaluation, and Orthographic Standardization Build Now
A new speech-to-text translation dataset for Central Kurdish derived from TED talks, enabling improved translation performance through orthographic standardization.
Speech Translation Apr 1 Code High viability
Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning Build Now
A framework for data-efficient robot imitation learning by scaling camera views during demonstration collection to improve generalization.
Robotics Apr 1 Code High viability
TALENT: Target-aware Efficient Tuning for Referring Image Segmentation Build Now
TALENT is a parameter-efficient tuning framework for referring image segmentation that addresses non-target activation issues to improve accuracy.
Image Segmentation Apr 1 Pending High viability
TP-Seg: Task-Prototype Framework for Unified Medical Lesion Segmentation Build Now
A unified medical lesion segmentation framework that uses task-aware adapters and prototype-guided decoders to achieve state-of-the-art performance across diverse imaging modalities and lesion types.
Medical AI Apr 1 Code High viability
Agent psychometrics: Task-level performance prediction in agentic coding benchmarks Build Now
A framework for predicting task-level performance of LLM agents in coding benchmarks, enabling better task design and agent evaluation.
AI Agents Apr 1 Code High viability
More Human, More Efficient: Aligning Annotations with Quantized SLMs Build Now
Fine-tune quantized small language models on limited human data to create deterministic, highly aligned evaluators and annotators.
LLM Evaluation Apr 1 Pending High viability
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale Build Now
A streaming Vision-Geometry-Action model for end-to-end autonomous driving that jointly outputs dense geometry and trajectory planning.
Autonomous Driving Apr 1 Pending High viability
An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models Build Now
A pipeline to generate enriched surgical video datasets for fine-grained spatial-temporal understanding, improving vision-language model performance.
Medical AI Apr 1 Code High viability
Quantum-Safe Code Auditing: LLM-Assisted Static Analysis and Quantum-Aware Risk Scoring for Post-Quantum Cryptography Migration Build Now
An LLM-assisted static analysis framework for auditing code for quantum-vulnerable cryptography and scoring migration risk.
AI Security Apr 1 Pending High viability
FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography Build Now
A frequency-guided framework for robust contactless physiological monitoring from facial videos, outperforming existing methods under challenging conditions.
Medical AI Apr 1 Code High viability
Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data Watch
Behavioral Score Diffusion offers a model-free, training-free trajectory planner for robotics that uses kernel-based score estimation from data, outperforming retrieval methods.
Robotics Apr 1 Code
Asymmetric Actor-Critic for Multi-turn LLM Agents Watch
An asymmetric actor-critic framework that uses a powerful proprietary LLM as an actor and a smaller open-source critic for runtime supervision to improve reliability in multi-turn agent interactions.
LLM Agents Mar 31
ReMoGen: Real-time Human Interaction-to-Reaction Generation via Modular Learning from Diverse Data Watch
ReMoGen generates real-time human reactions to interactions using modular learning and segment-level refinement.
Motion Generation Apr 1 Code
Maximizing T2-Only Prostate Cancer Localization from Expected Diffusion Weighted Imaging Watch
A novel framework for prostate cancer localization using only T2-weighted MRI by leveraging diffusion-weighted imaging as a privileged latent modality during training.
Medical AI Apr 1 Code
Gradient-Based Data Valuation Improves Curriculum Learning for Game-Theoretic Motion Planning Watch
Gradient-based data valuation significantly improves curriculum learning for game-theoretic motion planners by identifying crucial training scenarios.
Reinforcement Learning Apr 1 Code
Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects Watch
Introduces novel orthogonal learners for robust estimation of heterogeneous long-term treatment effects, particularly in settings with limited data overlap.
Causal Inference Apr 1 Code
Sit-to-Stand Transitions Detection and Duration Measurement Using Smart Lacelock Sensor Watch
A shoe-mounted sensor system for accurate Sit-to-Stand transition detection and duration measurement to assess fall risk in older adults.
Health Monitoring Mar 31
Representation Selection via Cross-Model Agreement using Canonical Correlation Analysis Watch
A training-free method using Canonical Correlation Analysis to select and reduce dimensionality of image representations, improving downstream performance.
Representation Learning Apr 1 Code
Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents Watch
A framework for LLM-based agents to selectively forget sensitive or outdated knowledge, enabling controlled unlearning with a natural language interface.
LLM Agents Apr 1
Inverse-Free Sparse Variational Gaussian Processes Watch
An inverse-free approach for sparse variational Gaussian processes that uses only matrix multiplications for improved stability and speed.
Gaussian Processes Apr 1 Code
BioCOMPASS: Integrating Biomarkers into Transformer-Based Immunotherapy Response Prediction Watch
BioCOMPASS integrates biomarkers and treatment information into transformer models to improve the generalizability of immunotherapy response prediction.
Biomedical AI Apr 1 Code
Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations Watch
A weakly-supervised paradigm for training feed-forward 3D reconstruction models using only relative depths and sparse correspondences.
3D Reconstruction Apr 1 Code
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models Watch
An AI decoding method for diffusion language models that balances generation quality with exploration for improved reasoning.
LLM Decoding Apr 1 Code
Long-Horizon Geometry-Aware Navigation among Polytopes via MILP-MPC and Minkowski-Based CBFs Watch
A hierarchical planning and control framework for geometry-aware robot navigation in complex environments using MILP-MPC and Minkowski-based CBFs.
Robotics Mar 31 Code
Improving Generalization of Deep Learning for Brain Metastases Segmentation Across Institutions Watch
A domain adaptation framework for brain metastases segmentation across institutions using VAE-MMD and nnU-Net.
Medical Imaging AI Apr 1 Code
G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs Watch
A white-box membership inference attack for LLMs that uses gradient-induced feature drift to detect training data.
LLM Privacy Apr 1 Code
A Dual-Action Fabric-Based Soft Robotic Glove for Ergonomic Hand Rehabilitation Watch
A dual-action fabric-based soft robotic glove with customized actuators for ergonomic hand rehabilitation, showing reduced muscle activity and improved grasp patterns.
Robotics Apr 1 Code
AI-Mediated Explainable Regulation for Justice Watch
A distributed AI system for explainable and adaptable regulation that models stakeholder preferences to ensure justice and legitimacy.
AI Governance Mar 31 Code
Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation Watch
A lightweight, prompt-guided adapter for CLIP to perform monocular depth estimation with minimal supervision.
Monocular Depth Estimation Apr 1 Code
Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning Watch
An AI-enabled street-view analysis pipeline for extracting building elevation data to improve regional flood risk assessment.
Geospatial AI Apr 1
Do Language Models Know When They'll Refuse? Probing Introspective Awareness of Safety Boundaries Watch
Language models can predict their refusal behavior with high accuracy, enabling confidence-based routing for safety-critical applications.
LLM Safety Mar 31
Using predefined vector systems to speed up neural network multimillion class classification Watch
A method to speed up neural network classification for millions of classes by reducing label prediction complexity.
AI Infrastructure Apr 1 Code
Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling Watch
A method to adapt text LLMs to speech by inserting and training new transformer layers, minimizing text capability degradation.
Speech AI Apr 1
VibeGuard: A Security Gate Framework for AI-Generated Code Watch
VibeGuard is a pre-publish security gate for AI-generated code, addressing blind spots in artifact hygiene, packaging, and supply-chain risk.
AI Security Apr 1
Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling Watch
Adaptive Parallel Monte Carlo Tree Search with negative early exit and boosting mechanism reduces p99 latency and improves throughput for LLM reasoning without sacrificing accuracy.
LLM Inference Optimization Apr 1
Shape Representation using Gaussian Process mixture models Watch
A lightweight functional shape representation using Gaussian Process mixture models for efficient and accurate 3D geometry encoding.
3D Shape Representation Apr 1 Code
Towards Viewpoint-Robust End-to-End Autonomous Driving with 3D Foundation Model Priors Watch
Leveraging 3D foundation model priors to improve viewpoint robustness in end-to-end autonomous driving trajectory planning.
Autonomous Driving Apr 1 Code
Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines Watch
Decomposing multi-LLM pipeline gains into re-solving, scaffolding, and content correction to inform targeted pipeline designs.
LLM Agents Apr 1 Code
Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections Watch
A new wordplay game benchmark to evaluate the social intelligence and reasoning capabilities of AI agents.
AI Agents Mar 31 Code
LightGuard: Transparent WiFi Security via Physical-Layer LiFi Key Bootstrapping Watch
LightGuard enhances WiFi security by offloading cryptographic key establishment to a physically confined LiFi channel.
Cybersecurity Apr 1
PrivHAR-Bench: A Graduated Privacy Benchmark Dataset for Video-Based Action Recognition Watch
A standardized benchmark dataset and evaluation toolkit for privacy-preserving video action recognition, enabling nuanced analysis of the privacy-utility trade-off.
Privacy-Preserving AI Apr 1 Code
StretchBot: A Neuro-Symbolic Framework for Adaptive Guidance with Assistive Robots Watch
StretchBot: A neuro-symbolic framework for adaptive guidance in assistive robots, combining multimodal perception with LLM reasoning for context-aware adjustments.
Assistive Robotics Apr 1
NFC based inventory control system for secure and efficient communication Watch
A secure and efficient inventory control system using NFC tags to replace vulnerable barcodes for retail applications.
Inventory Management Mar 31
Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics Watch
A method to disentangle lexical confounds from semantic superposition in large language models, improving word sense disambiguation and knowledge editing.
LLM Interpretability Apr 1
Transfer learning for nonparametric Bayesian networks Watch
Two transfer learning methods for nonparametric Bayesian networks to improve learning performance with scarce data.
Bayesian Networks Apr 1 Code
Emotion Entanglement and Bayesian Inference for Multi-Dimensional Emotion Understanding Watch
A theory-grounded benchmark and Bayesian inference framework for multi-dimensional emotion understanding in rich textual contexts.
Multi-Dimensional Emotion Understanding Apr 1 Code
When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction Watch
This work engineers structured features from founder career data to predict startup success, outperforming LLM baselines and diagnosing dataset limitations for future improvements.
Startup Success Prediction Apr 1 Pending
UK AISI Alignment Evaluation Case-Study Watch
Evaluating frontier AI models for research sabotage when deployed as coding assistants, finding some models refuse safety-relevant tasks.
AI Safety Apr 1
One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction Watch
A case-adaptive multi-agent system that dynamically assembles specialist panels for clinical prediction, improving accuracy and transparency.
Clinical AI Agents Mar 31
Infinite-Horizon Ergodic Control via Kernel Mean Embeddings Watch
An infinite-horizon ergodic controller using kernel mean embeddings for long-duration coverage tasks.
Robotics Apr 1 Code
Toward Personalized Darts Training: A Data-Driven Framework Based on Skeleton-Based Biomechanical Analysis and Motion Modeling Watch
A data-driven framework for personalized darts training that uses biomechanical analysis and motion modeling to provide targeted feedback.
Sports Analytics Apr 1
A wearable haptic device for edge and surface simulation Watch
A compact, lightweight wearable haptic device that simulates distinct edge and surface contact feedback for enhanced object manipulation in VR.
Haptic Devices Apr 1 Code
Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games Ignore
A theoretical framework for multi-agent reinforcement learning in partially observable games using internal states and a policy gradient method, with a proven convergence bound.
Multi-Agent RL Apr 1 Code
A Decoupled Basis-Vector-Driven Generative Framework for Dynamic Multi-Objective Optimization Ignore
A decoupled generative framework for dynamic multi-objective optimization that uses wavelet transform and sparse dictionary learning to track moving Pareto fronts with zero-shot inference.
Optimization Apr 1 Code
Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry Ignore
Collaborative AI agents and critics for fault detection and cause analysis in network telemetry, leveraging private cost functions and foundation models.
Multi-Agent Systems Mar 31
Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar Ignore
Mine-JEPA is an in-domain self-supervised learning pipeline for side-scan sonar mine classification, outperforming larger foundation models in data-scarce maritime imagery.
Computer Vision Apr 1
Full-Gradient Successor Feature Representations Ignore
A novel reinforcement learning algorithm that improves transfer learning by optimizing successor features with full gradient updates, offering theoretical convergence guarantees and empirical performance gains.
Reinforcement Learning Apr 1 Code
Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation Ignore
An influence-guided framework that optimizes synthetic data generation for LLMs by using target model feedback to adapt data curation rubrics.
LLM Training Apr 1
From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks Ignore
Interpreting LLM failures on character counting tasks to reveal internal reasoning and output suppression mechanisms.
LLM Interpretability Apr 1 Code
Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis Ignore
A benchmark and analysis of continual learning for remote sensing vision-language models to address catastrophic forgetting.
Continual Learning for Remote Sensing Apr 1 Code
RT-GS: Gaussian Splatting with Reflection and Transmittance Primitives Ignore
A unified framework for Gaussian Splatting that integrates microfacet material models and ray tracing to jointly model specular reflection and transmittance for realistic novel view synthesis.
3D Reconstruction Apr 1 Code
A Japanese Benchmark for Evaluating Social Bias in Reasoning Based on Attribution Theory Ignore
A Japanese benchmark dataset to evaluate social bias in LLM reasoning based on attribution theory, sensitive to cultural nuances.
LLM Bias Apr 1 Code
Informed Machine Learning with Knowledge Landmarks Ignore
KD-ML integrates numeric data with granular knowledge landmarks to build more accurate models, outperforming data-driven approaches on physics-governed benchmarks.
Informed Machine Learning Mar 31 Code
Advancing Complex Video Object Segmentation via Tracking-Enhanced Prompt: The 1st Winner for 5th PVUW MOSE Challenge Ignore
A training-free approach using tracking-enhanced prompts for complex video object segmentation.
Video Object Segmentation Apr 1
On rankings in multiplayer games with an application to the game of Whist Ignore
A novel extension of the Bradley-Terry model for multiplayer games with an adapted algorithm for ranking.
Game Theory Apr 1 Code
Lightweight, Practical Encrypted Face Recognition with GPU Support Ignore
Lightweight and practical encrypted face recognition with GPU acceleration, reducing memory and improving runtime.
Privacy-Preserving AI Apr 1
GPT-NL Public Corpus: A Permissively Licensed, Dutch-First Dataset for LLM Pre-training Ignore
GPT-NL Public Corpus is a large, permissively licensed dataset of Dutch language resources for LLM pre-training.
LLM Datasets Apr 1 Code
AutoEG: Exploiting Known Third-Party Vulnerabilities in Black-Box Web Applications Ignore
An automated multi-agent framework for generating exploits against known vulnerabilities in black-box web applications.
Web Security Apr 1
Accurate and Scalable Matrix Mechanisms via Divide and Conquer Ignore
A novel divide-and-conquer approach for scalable and accurate differentially private query answering and synthetic data generation.
Differential Privacy Apr 1 Code
REM-CTX: Automated Peer Review via Reinforcement Learning with Auxiliary Context Ignore
An automated peer review system that incorporates auxiliary context like figures and external scholarly signals using reinforcement learning.
LLM Applications Mar 31
Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions Ignore
An uncertainty-aware scheduling metric for LLM inference that significantly reduces latency and improves throughput by accounting for output length variability.
LLM Inference Scheduling Apr 1
Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention Ignore
Stochastic Attention, inspired by fruit fly connectomes, enhances efficient attention mechanisms by introducing random routing for improved expressivity and receptive field growth.
LLM Efficiency Apr 1
Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization Ignore
Discovering novel data shuffling strategies for stochastic optimization using LLM-guided program evolution to improve convergence.
LLM Training Mar 31 Code
Cybersecurity Risk Assessment for CubeSat Missions: Adapting Established Frameworks for Resource-Constrained Environments Ignore
A cybersecurity risk assessment framework adapted for resource-constrained CubeSat missions, offering proportionate guidance for mission designers and regulators.
Cybersecurity Frameworks Mar 31 Code
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control Ignore
An efficient and expressive text-to-speech system using only state-space models for conditioning, reducing parameters and improving throughput.
Text-to-Speech Mar 31
Toward Optimal Sampling Rate Selection and Unbiased Classification for Precise Animal Activity Recognition Ignore
A novel network for animal activity recognition that customizes features and calibrates classifiers to improve accuracy across all behaviors, even with imbalanced data.
Animal Activity Recognition Apr 1 Code
To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining Ignore
A framework for understanding the trade-offs between pretraining and retrieval for language models, guiding optimal data allocation.
LLM Pretraining & RAG Apr 1 Pending
Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning Ignore
An interactive route planning algorithm for users with accessibility needs that prioritizes user feedback for efficient optimization.
Route Planning Apr 1
Human-in-the-Loop Control of Objective Drift in LLM-Assisted Computer Science Education Ignore
A human-in-the-loop curriculum for computer science education that teaches students to control objective drift in LLM-assisted workflows.
LLM Education Mar 31
Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards Ignore
HALIDE: A hierarchical apprenticeship learning framework that leverages imperfect student demonstrations to infer evolving rewards and improve pedagogical decisions.
Apprenticeship Learning Mar 31 Code
From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering Ignore
A playbook for using GenAI tutors to improve student learning in software engineering by grounding them in course knowledge.
GenAI Education Mar 31
Deep Learning-Accelerated Surrogate Optimization for High-Dimensional Well Control in Stress-Sensitive Reservoirs Ignore
A deep learning framework accelerates high-dimensional well control optimization in stress-sensitive reservoirs by using a surrogate model to reduce computational cost by orders of magnitude.
Reservoir Optimization Apr 1
Autonomous Adaptive Solver Selection for Chemistry Integration via Reinforcement Learning Ignore
A reinforcement learning framework that autonomously selects solvers for chemistry integration to reduce computational cost while maintaining accuracy.
Scientific Computing Mar 31 Code
Super-Resolving Coarse-Resolution Weather Forecasts With Flow Matching Ignore
A generative super-resolution framework for weather forecasts that enhances spatial resolution post-processing.
Generative AI Apr 1
SoftHand Model-W: A 3D-Printed, Anthropomorphic, Underactuated Robot Hand with Integrated Wrist and Carpal Tunnel Ignore
A 3D-printed, anthropomorphic robot hand with an integrated wrist and carpal tunnel design that enables more versatile and human-like manipulation.
Robotics Hardware Apr 1 Code
OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning Ignore
A benchmark for evaluating large multimodal models on the complex task of converting PCB schematic diagrams into structured, spatially weighted netlist graphs.
Multimodal Document Understanding Mar 31 Code
Enhancing REST API Fuzzing with Access Policy Violation Checks and Injection Attacks Ignore
Enhancing REST API fuzzing with novel oracles for access policy violations and injection attacks, generating executable test cases.
API Security Apr 1
Neuropsychiatric Deviations From Normative Profiles: An MRI-Derived Marker for Early Alzheimer's Disease Detection Ignore
A deep learning framework using MRI-derived brain anatomy to predict early Alzheimer's disease conversion from neuropsychiatric symptoms.
Medical AI Apr 1
Tucker Diffusion Model for High-dimensional Tensor Generation Ignore
A novel Tucker diffusion model for generating structured high-dimensional tensor data with theoretical advantages over vectorized approaches.
Generative Models Apr 1 Code
Vocal Prognostic Digital Biomarkers in Monitoring Chronic Heart Failure: A Longitudinal Observational Study Ignore
Voice features can predict health deterioration in chronic heart failure patients, offering a non-invasive alternative to current monitoring methods.
Medical AI Mar 31
Fast and Accurate Probing of In-Training LLMs' Downstream Performances Ignore
A new method to evaluate LLM performance during training, reducing evaluation time from hours to minutes.
LLM Evaluation Apr 1
Collaborative Task and Path Planning for Heterogeneous Robotic Teams using Multi-Agent PPO Ignore
A collaborative planning strategy using Multi-Agent PPO to coordinate heterogeneous robotic teams for efficient extraterrestrial exploration.
Robotics Apr 1 Pending
Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking Ignore
Improving robotic manipulation robustness under distribution shift by combining deep reinforcement learning with bounded extremum seeking.
Robotics Apr 1
Obfuscating Code Vulnerabilities against Static Analysis in JavaScript Code Ignore
This research quantifies the impact of JavaScript obfuscation on static code analysis tools, revealing significant weaknesses in current security pipelines.
Code Security Apr 1 Code
Property-Level Flood Risk Assessment Using AI-Enabled Street-View Lowest Floor Elevation Extraction and ML Imputation Across Texas Ignore
Brainstacks is a modular architecture for continual multi-domain LLM fine-tuning using frozen MoE-LoRA adapter stacks that enable cross-domain cognitive capability composition.
LLM Continual Learning Apr 1 Code
Sub-metre Lunar DEM Generation and Validation from Chandrayaan-2 OHRC Multi-View Imagery Using Open-Source Photogrammetry Ignore
Generating sub-meter lunar digital elevation models from high-resolution orbital imagery using an open-source photogrammetry pipeline.
Geospatial AI Apr 1
EmbedPart: Embedding-Driven Graph Partitioning for Scalable Graph Neural Network Training Ignore
EmbedPart is an embedding-driven graph partitioning approach that accelerates scalable distributed GNN training by clustering node embeddings.
GNN Training Apr 1
The Recipe Matters More Than the Kitchen:Mathematical Foundations of the AI Weather Prediction Pipeline Ignore
A theoretical framework and empirical validation for improving AI weather prediction by focusing on the entire learning pipeline, not just architecture.
AI Weather Prediction Apr 1 Code
Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers Ignore
PaperRecon is a new evaluation framework for AI-written papers, disentangling presentation quality from hallucination risks.
AI Ethics & Evaluation Apr 1 Code
Hierarchical Discrete Flow Matching for Graph Generation Ignore
A hierarchical generative framework for graph generation that reduces computational cost and generation time.
Graph Generation Mar 31 Code
Representation choice shapes the interpretation of protein conformational dynamics Ignore
A library for computing and analyzing multiple protein representations to provide a comparative framework for molecular dynamics simulations.
Scientific AI Apr 1
Multimodal Language Models Cannot Spot Spatial Inconsistencies Ignore
Demonstrates that current multimodal language models fail to detect spatial inconsistencies in 3D scenes, highlighting a gap in their physical understanding.
Multimodal AI Apr 1
Multicentric thrombus segmentation using an attention-based recurrent network with gradual modality dropout Ignore
An attention-based recurrent network with gradual modality dropout for multicentric thrombus segmentation in 3D brain scans.
Medical Imaging Segmentation Apr 1
Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent Ignore
This research identifies a new adversarial failure mode in spiking neural network object detectors where detection count is preserved but accuracy collapses.
Adversarial Robustness Apr 1
PRISM: Differentiable Analysis-by-Synthesis for Fixel Recovery in Diffusion MRI Ignore
A differentiable framework for improved fiber recovery in diffusion MRI by fitting explicit multi-compartment models over spatial patches.
Medical AI Mar 31
Deep Networks Favor Simple Data Ignore
This research analyzes how deep networks favor simpler data, revealing a consistent pattern across various models and datasets, but lacks a clear product path.
LLM Analysis Apr 1 Code
A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation Ignore
A multi-agent LLM framework simulating supportive behavioral health dialogue through coordinated, role-differentiated agents with continuous safety auditing.
Agents Mar 31
MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference Ignore
A method for training a single model to be robust across multiple quantization formats for elastic inference.
LLM Training Apr 1
A Cross-graph Tuning-free GNN Prompting Framework Ignore
A tuning-free GNN prompting framework for cross-graph adaptation without retraining.
Graph Neural Networks Apr 1
Beyond Latency: A System-Level Characterization of MPC and FHE for PPML Ignore
A system-level characterization of MPC and FHE for privacy-preserving machine learning, evaluating performance, energy, and cost across various scenarios.
Privacy-Preserving ML Mar 31
Speech LLMs are Contextual Reasoning Transcribers Ignore
A novel approach to automatic speech recognition that leverages large language models for contextual reasoning and user-guided transcription.
Speech AI Apr 1
UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems Ignore
A unified architecture for recommendation systems that unifies mainstream scaling blocks and optimizes scaling efficiency.
Recommendation Systems Apr 1
Not My Truce: Personality Differences in AI-Mediated Workplace Negotiation Ignore
Investigating how personality traits moderate the effectiveness of AI-mediated workplace negotiation coaching, suggesting tailored interventions for different user profiles.
AI Coaching Apr 1
A Physical Imitation Learning Pipeline for Energy-Efficient Quadruped Locomotion Assisted by Parallel Elastic Joint Ignore
Physical Imitation Learning distills RL policies into passive body dynamics for energy-efficient quadruped locomotion, reducing control effort through parallel elastic joints.
Robotics Apr 1
Convergence of Byzantine-Resilient Gradient Tracking via Probabilistic Edge Dropout Ignore
A novel stochastic gradient tracking method designed to maintain convergence properties in distributed optimization networks with Byzantine agents.
Distributed Optimization Apr 1
A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch Ignore
A framework for assessing programming skills in Scratch using fuzzy clustering, inspired by CEFR levels, to identify learning gaps and guide curriculum design.
Educational AI Apr 1
Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation Ignore
Investigating the relationship between score saturation and issue discovery in LLM-based agent evaluations to understand the optimal number of agents needed.
Agent Evaluation Apr 1
Can Large Language Models Self-Correct in Medical Question Answering? An Exploratory Study Ignore
An exploratory study on whether large language models can self-correct in medical question answering, finding inconsistent benefits.
LLM Applications Mar 31 Code
Suppressing Non-Semantic Noise in Masked Image Modeling Representations Ignore
A post-hoc method to suppress non-semantic information in image representations for improved zero-shot performance.
Computer Vision Mar 31
Hierarchical Motion Planning and Control under Unknown Nonlinear Dynamics via Predicted Reachability Ignore
A hierarchical framework for autonomous motion planning and control under unknown nonlinear dynamics using piecewise-affine models and graph-based reachability.
Robotics Mar 31 Code
Certificate-Driven Closed-Loop Multi-Agent Path Finding with Inheritable Factorization Ignore
A certificate-driven approach for closed-loop multi-agent path finding that improves scalability and global guarantees by incorporating inheritable factorization.
Robotics Apr 1 Code
The Rashomon Effect for Visualizing High-Dimensional Data Ignore
A framework for visualizing high-dimensional data by embracing the multiplicity of embeddings, leading to more interpretable and robust representations.
Data Visualization Apr 1
Efficient Software Vulnerability Detection Using Transformer-based Models Ignore
Utilize transformer models with program slices to improve accuracy in detecting software vulnerabilities.
Code Security Mar 31
Offline Constrained RLHF with Multiple Preference Oracles Ignore
Offline constrained reinforcement learning from human feedback with multiple preference oracles for performance and safety trade-offs.
LLM Alignment Mar 31
Self-Routing: Parameter-Free Expert Routing from Hidden States Ignore
A parameter-free routing mechanism for Mixture-of-Experts models that eliminates the need for a learned router.
LLM Training Apr 1
Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap Ignore
A theoretical framework for causal effect estimation that improves overlap conditions using novel 'deconfounding scores' for high-dimensional data.
Causal Inference Apr 1
GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization Ignore
A novel framework for multi-agent collaboration that uses active shared perception to overcome non-stationarity and accelerate convergence.
Multi-Agent Systems Apr 1 Code
The Persistent Vulnerability of Aligned AI Systems Ignore
This thesis explores AI safety by automating circuit discovery, removing dangerous behaviors with Latent Adversarial Training, and analyzing agentic misalignment in frontier models.
AI Safety Mar 31
Decision-Centric Design for LLM Systems Ignore
A decision-centric framework for LLM systems that separates control decisions from generation for improved reliability and diagnosability.
LLM Systems Apr 1
Lipschitz Dueling Bandits over Continuous Action Spaces Ignore
A novel algorithm for dueling bandits over continuous action spaces with Lipschitz structure, achieving logarithmic space complexity.
Reinforcement Learning Apr 1
Scenario theory for multi-criteria data-driven decision making Ignore
A generalized scenario theory for multi-criteria data-driven decision making that provides more accurate robustness certificates.
Decision Making Apr 1 Code
Large Language Models in the Abuse Detection Pipeline Ignore
This survey analyzes the integration of Large Language Models into the Abuse Detection Lifecycle, covering label generation, detection, review, and governance, while highlighting challenges and future directions.
Abuse Detection Mar 31
Performance of Neural and Polynomial Operator Surrogates Ignore
Compares neural and polynomial operator surrogates for parametric PDEs, finding no universally superior method and highlighting the importance of matching methodology to problem characteristics.
Scientific ML Apr 1
Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold Ignore
Identifies a critical feature norm threshold that predicts the onset of neural collapse in deep networks.
Deep Learning Theory Mar 31 Code
Phase space integrity in neural network models of Hamiltonian dynamics: A Lagrangian descriptor approach Ignore
A new diagnostic framework using Lagrangian Descriptors to evaluate the phase space integrity of neural network models for Hamiltonian dynamics.
Physics-Informed AI Apr 1 Code
Lead Zirconate Titanate Reservoir Computing for Classification of Written and Spoken Digits Ignore
Utilizing Lead Zirconate Titanate as a physical reservoir for improved handwritten digit classification.
Hardware AI Mar 31 Code
Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense Ignore
A multi-agent LLM governance system for safe, two-timescale reinforcement learning in SDN-IoT defense.
Agents Apr 1
Agentic Tool Use in Large Language Models Ignore
Organizes and analyzes the literature on agentic tool use in large language models across three paradigms, highlighting challenges and evolutionary views.
LLM Agents Apr 1
Stein Variational Uncertainty-Adaptive Model Predictive Control Ignore
A novel controller for nonlinear dynamical systems that uses Stein variational inference to adapt to latent parametric uncertainty, improving performance-robustness tradeoffs.
Robotics Control Apr 1 Code
Adversarial Attacks in AI-Driven RAN Slicing: SLA Violations and Recovery Ignore
Studies the impact of adversarial attacks on AI-driven RAN slicing decisions, quantifying SLA violations and recovery behavior.
Network Security Apr 1
Reasoning Shift: How Context Silently Shortens LLM Reasoning Ignore
This paper investigates how context length and conversational settings silently shorten LLM reasoning traces, potentially impacting performance on complex tasks.
LLM Reasoning Apr 1 Code
Screening Is Enough Ignore
Multiscreen is a novel language model architecture that uses screening to achieve absolute query-key relevance, reducing parameters and improving inference latency.
LLM Architecture Apr 1
Debiased Estimators in High-Dimensional Regression: A Review and Replication of Javanmard and Montanari (2014) Ignore
Examines and replicates a debiased LASSO framework for high-dimensional regression, comparing its performance and power against the desparsified LASSO.
Statistical Inference Apr 1 Code
Temporal Dependencies in In-Context Learning: The Role of Induction Heads Ignore
Investigating the role of induction heads in LLMs for understanding temporal dependencies in in-context learning.
LLM Internals Apr 1
Optimal Brain Decomposition for Accurate LLM Low-Rank Approximation Ignore
A theoretical framework for optimal low-rank decomposition of LLM weights using second-order Hessian information.
LLM Training Apr 1
OrgAgent: Organize Your Multi-Agent System like a Company Ignore
A hierarchical framework for organizing multi-agent systems like a company to improve reasoning, reduce costs, and enhance coordination.
Agents Apr 1
Universal YOCO for Efficient Depth Scaling Ignore
Universal YOCO combines a decoder-decoder architecture with recursive computation for efficient depth scaling of Large Language Models, improving inference efficiency.
LLM Training Apr 1 Code
Bridging Structured Knowledge and Data: A Unified Framework with Finance Applications Ignore
A unified framework embeds structured knowledge into neural networks for consistent estimation and improved financial applications.
Econometric AI Apr 1
A Survey of On-Policy Distillation for Large Language Models Ignore
A survey providing a unified framework and comprehensive overview of on-policy distillation techniques for large language models, addressing exposure bias.
LLM Training Apr 1
Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators Ignore
This research explores how students' trust, AI literacy, and need for cognition influence their reliance on AI assistants in programming tasks, suggesting a need for better instructional support.
AI Literacy in Education Apr 1
Sampling-based Task and Kinodynamic Motion Planning under Semantic Uncertainty Ignore
An anytime algorithm for integrated task and kinodynamic motion planning under semantic uncertainty in partially observable environments.
Robotics Apr 1
Phase transition on a context-sensitive random language model with short range interactions Ignore
Investigate phase transitions in context-sensitive random language models with short-range interactions to understand the intrinsic nature of language properties.
Language Model Theory Apr 1 Code
Efficient DPF-based Error-Detecting Information-Theoretic Private Information Retrieval Over Rings Ignore
A novel information-theoretic scheme for error-detecting private information retrieval over rings that reduces key size and communication overhead.
Privacy-Preserving Systems Apr 1
Explainable AI for Blind and Low-Vision Users: Navigating Trust, Modality, and Interpretability in the Agentic Era Ignore
Developing accessible explainable AI for blind and low-vision users to foster trust and independent use of autonomous AI agents.
Explainable AI Mar 31
Denoising distances beyond the volumetric barrier Ignore
A novel approach for reconstructing the latent geometry of Riemannian manifolds from noisy distance data, breaking the volumetric barrier in higher dimensions.
Geometric Reconstruction Apr 1 Code
Event Embedding of Protein Networks : Compositional Learning of Biological Function Ignore
Investigating whether enforcing compositional structure in sequence embeddings improves geometric organization in protein-protein interaction networks.
Bioinformatics AI Apr 1
Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from Practitioners Ignore
A review of practitioner conference talks to understand how companies adopt, architect, and implement LLM-driven AI agentic systems.
Agents Mar 31
Learning Shared Representations for Multi-Task Linear Bandits Ignore
A theoretical framework for multi-task linear bandits that improves sample efficiency by learning shared low-rank representations.
Reinforcement Learning Apr 1
Common TF-IDF variants arise as key components in the test statistic of a penalized likelihood-ratio test for word burstiness Ignore
This paper provides a statistical perspective on TF-IDF, showing how it arises from a penalized likelihood-ratio test for word burstiness.
NLP Term Weighting Apr 1
Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer Ignore
This paper details the large-scale pretraining of Mixture of Experts language models on a supercomputer, focusing on optimizing training speed and stability.
LLM Training Apr 1 Code
Generalization Bounds for Spectral GNNs via Fourier Domain Analysis Ignore
Develops theoretical generalization bounds for spectral graph neural networks by analyzing their behavior in the graph Fourier domain.
Graph Neural Networks Apr 1
Predicting Wave Reflection and Transmission in Heterogeneous Media via Fourier Operator-Based Transformer Modeling Ignore
A transformer-based ML model that approximates solutions to Maxwell's equations for wave reflection and transmission.
Scientific ML Mar 31
Softmax gradient policy for variance minimization and risk-averse multi armed bandits Ignore
A theoretical algorithm for risk-aware multi-armed bandits that minimizes variance in reward selection.
Reinforcement Learning Mar 31
Measuring the Representational Alignment of Neural Systems in Superposition Ignore
A theoretical framework for understanding neural network representations in superposition, highlighting limitations of current alignment metrics.
Neural Representation Analysis Mar 31
Reachability-Aware Time Scaling for Path Tracking Ignore
A theoretical approach to scale speeds along robot paths to ensure collision-free waypoint tracking under acceleration limits.
Robotics Path Tracking Apr 1
Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics Ignore
Authors leave unique 'fingerprints' in the novelty dynamics of their writing, detectable at both book and chapter levels.
Author Identification Apr 1
Go Big or Go Home: Simulating Mobbing Behavior with Braitenbergian Robots Ignore
Simulating mobbing behavior in Braitenbergian robots using the Webots platform to explore the effects of mobbing call range and group size on predator harassment.
Robotics Simulation Apr 1
Reconsidering Dependency Networks from an Information Geometry Perspective Ignore
An information-geometric analysis of dependency networks and pseudo-Gibbs sampling for theoretical understanding.
Probabilistic Modeling Apr 1
Valency Classification of Mapudungun Verbal Roots. Established by the language's own morphotactics Ignore
Classifying the valency of Mapudungun verbal roots using the language's own morphotactics to improve a morphological analyzer.
NLP Linguistics Apr 1
Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules Ignore
Theoretical convergence analysis of projected stochastic natural gradient variational inference under various step size and sample/batch size schedules, providing new non-asymptotic results.
Bayesian Inference Apr 1
Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels Ignore
Investigating the phenomenon of grokking in feature learning kernels, showing that breaking data symmetry is crucial for generalization.
Machine Learning Theory Mar 31
LAtent Phase Inference from Short time sequences using SHallow REcurrent Decoders (LAPIS-SHRED) Ignore
LAPIS-SHRED is a modular architecture for reconstructing and forecasting complete spatiotemporal dynamics from sparse sensor observations in short temporal windows.
Scientific AI Apr 1
On the Necessity of Pre-agreed Secrets for Thwarting Last-minute Coercion: Vulnerabilities and Lessons From the Loki E-voting Protocol Ignore
Identifies vulnerabilities in the Loki e-voting protocol and argues for the necessity of pre-agreed secrets to prevent last-minute coercion.
Security Mar 31
Frege in the Flesh: Biolinguistics and the Neural Enforcement of Syntactic Structures Ignore
Exploring the biological foundations of human language through mathematical and algebraic models of syntactic structures.
Linguistics Mar 31
Model-Based Learning of Near-Optimal Finite-Window Policies in POMDPs Ignore
Developing a sample-efficient model estimation procedure for learning policies in partially observable environments.
Reinforcement Learning Apr 1
Towards Initialization-dependent and Non-vacuous Generalization Bounds for Overparameterized Shallow Neural Networks Ignore
Develops fully initialization-dependent complexity bounds for shallow neural networks with general Lipschitz activation functions, offering non-vacuous generalization bounds for overparameterized models.
Machine Learning Theory Apr 1
Play-Testing REMind: Evaluating an Educational Robot-Mediated Role-Play Game Ignore
REMind is an educational robot-mediated role-play game designed to support anti-bullying bystander intervention among children.
Educational Robotics Mar 31
Rapid mixing in positively weighted restricted Boltzmann machines Ignore
Theoretical analysis of mixing time bounds for positively weighted restricted Boltzmann machines.
Machine Learning Theory Apr 1