VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation Build Now
VLAA-GUI provides a modular framework for robust GUI automation through enhanced error handling and recovery features.
GitHub 7 stars Velocity flat History 1 snapshot GUI Automation Apr 23 Pending High viability
BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature Build Now
BioMiner is a multi-modal system that automates the extraction of protein-ligand bioactivity data from literature, significantly accelerating drug discovery workflows.
GitHub 80 stars Velocity flat History 1 snapshot Drug Discovery AI Apr 23 Pending High viability
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution Build Now
VARestorer distills pre-trained visual autoregressive models into a one-step super-resolution system, achieving state-of-the-art results with significantly faster inference.
GitHub 4 stars Velocity flat History 1 snapshot Image Super-Resolution Apr 23 Pending High viability
Adaptive Instruction Composition for Automated LLM Red-Teaming Build Now
An adaptive instruction composition framework for LLM red-teaming that jointly optimizes effectiveness and diversity of attacks using reinforcement learning.
GitHub 17 stars Velocity flat History 1 snapshot LLM Security Apr 22 Pending High viability
StructMem: Structured Memory for Long-Horizon Behavior in LLMs Build Now
StructMem is a novel memory framework for LLMs that enhances long-term conversational agents by preserving event relationships, reducing costs and improving performance.
GitHub 769 stars Velocity flat History 1 snapshot LLM Memory Apr 23 Pending High viability
Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture Build Now
An open-source AI memory system applying spatial metaphors to LLMs, offering a novel verbatim storage philosophy and low-cost offline operation.
GitHub 5 stars Velocity flat History 1 snapshot LLM Memory Apr 23 Pending High viability
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows Build Now
Optimize LLM-based workflows by reducing tool-related token overhead and improving efficiency with dynamic tool gating.
GitHub 0 stars Velocity flat History 1 snapshot AI Optimized Workflows Apr 23 Pending High viability
AEL: Agent Evolving Learning for Open-Ended Environments Build Now
Agent Evolving Learning (AEL) enables LLM agents to learn from experience in open-ended environments by dynamically selecting retrieval policies and using LLM-driven reflection to interpret past outcomes.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 23 Pending High viability
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering Build Now
HypEHR: A compact hyperbolic model for efficient Electronic Health Record question answering, outperforming LLMs with fewer parameters.
GitHub 0 stars Velocity flat History 1 snapshot Medical AI Apr 22 Pending High viability
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems Build Now
A hybrid multi-robot control framework using waypoint-based bi-level planning to jointly optimize task and motion planning in cluttered environments.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Robot Systems Apr 22 Pending High viability
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks Build Now
A co-evolutionary framework that enables LLM agents to learn and reuse structured skills for long-horizon tasks, significantly improving performance in game environments.
GitHub 6 stars Velocity flat History 1 snapshot AI Agents Apr 22 Pending High viability
SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL Build Now
SQLyzr is a comprehensive benchmark and evaluation platform for Text-to-SQL models, offering realistic evaluation and fine-grained analysis.
GitHub 1 stars Velocity flat History 1 snapshot Text-to-SQL Apr 23 Pending High viability
StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling Build Now
A controllable image style transfer method using visual autoregressive modeling and reinforcement learning to generate high-quality stylized images.
GitHub 5 stars Velocity flat History 1 snapshot Generative Image Apr 22 Pending High viability
Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning Build Now
A novel self-supervised learning strategy that enhances robustness to image corruptions in aerial imagery, offering interpretable uncertainty signals.
GitHub 0 stars Velocity flat History 1 snapshot Robust Self-Supervised Learning Apr 23 Pending High viability
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion Build Now
GS-Quant is a novel framework for Knowledge Graph Completion that generates semantically coherent and structurally stratified discrete codes for LLMs, outperforming existing methods.
GitHub 4 stars Velocity flat History 1 snapshot Knowledge Graph Completion Apr 23 Pending High viability
Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs Watch
This work introduces a new dataset to analyze and reveal hidden cultural and regional biases in LLMs, showing a surprising preference for Japanese culture.
GitHub 1473 stars Velocity flat History 1 snapshot LLM Bias Apr 23 Pending
Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement Build Now
An efficient image enhancement model for mobile devices that maintains high quality and fast processing speeds through quantization-aware training and a hierarchical network architecture.
GitHub 2 stars Velocity flat History 1 snapshot Image Enhancement Apr 23 Pending High viability
SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis Build Now
A semantics-aware framework for text-to-SQL data synthesis that uses specialized modules for analysis, synthesis, and verification to ensure semantic validity.
GitHub 0 stars Velocity flat History 1 snapshot Text-to-SQL Apr 23 Pending High viability
Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning Build Now
A framework to mitigate spurious signals in test-time reinforcement learning for math reasoning in LLMs, improving accuracy and stability.
GitHub 1 stars Velocity flat History 1 snapshot LLM Reasoning Apr 23 Pending High viability
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models Build Now
A novel function hijacking attack that manipulates tool selection in agentic AI models, demonstrating significant vulnerabilities and the need for enhanced security.
GitHub 1867 stars Velocity flat History 1 snapshot LLM Security Apr 22 Pending High viability
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation Build Now
A diffusion-based framework that enhances controllable human video generation by systematically investigating and optimizing the use of synthetic data.
GitHub 714 stars Velocity flat History 1 snapshot Generative Video Apr 23 Pending High viability
On Reasoning Behind Next Occupation Recommendation Build Now
Fine-tuning LLMs with generated reasons to improve next occupation prediction accuracy.
GitHub 0 stars Velocity flat History 1 snapshot LLM Applications Apr 23 Pending High viability
SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery Build Now
A synthetic dataset and simulation pipeline for multi-task aerial imagery analysis, enabling joint research in depth estimation, domain adaptation, and super-resolution.
GitHub 0 stars Velocity flat History 1 snapshot Computer Vision Apr 23 Pending High viability
Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models Build Now
A framework for dual attribution and verification to enhance factual reliability and interpretability of LLM outputs.
GitHub 0 stars Velocity flat History 1 snapshot LLM Reliability Apr 23 Pending High viability
EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval Build Now
EngramaBench is a new benchmark for evaluating long-term conversational memory in LLMs, featuring a graph-structured memory system called Engrama that shows promise in cross-space reasoning.
GitHub 0 stars Velocity flat History 1 snapshot LLM Memory Systems Apr 23 Pending High viability
HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration Watch
A hierarchical multi-agent framework for long-form video understanding that improves temporal reasoning through question-aware collaboration.
GitHub 0 stars Velocity flat History 1 snapshot Video Understanding Apr 23 Pending
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics Build Now
A large-scale, open dataset and foundation models for medical robotics, enabling autonomous procedures and advancing robot learning with multi-embodiment simulation.
GitHub stars n/a Velocity flat History 1 snapshot Medical Robotics Apr 22 Code High viability
MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting Build Now
MISTY is a high-throughput generative motion planner that achieves state-of-the-art closed-loop performance with pure single-step inference for autonomous driving.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Driving Apr 23 Code High viability
Addressing Image Authenticity When Cameras Use Generative AI Watch
A post-capture method using an image-specific MLP and encoder to recover the 'unhallucinated' version of camera images altered by generative AI.
GitHub 714 stars Velocity flat History 1 snapshot Image Authenticity Apr 23 Pending
Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research Watch
A benchmark for evaluating AI agents in financial investment research, highlighting current limitations and guiding the development of specialized AI for finance.
GitHub 1867 stars Velocity flat History 1 snapshot Financial AI Apr 22 Pending
Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery Build Now
A multimodal LLM framework that assesses building conditions from street-view imagery, outperforming human raters and offering distilled models for efficient, large-scale deployment.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal LLMs Apr 22 Code High viability
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding Build Now
A reinforcement learning framework co-evolves a proposer and visual critic to achieve precise pixel-level localization for natural language instructions in GUIs.
GitHub stars n/a Velocity flat History 1 snapshot GUI Grounding Apr 23 Code High viability
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems Build Now
A parameter-efficient training framework for multi-agent systems that jointly optimizes latent communication with reasoning, improving performance on complex tasks.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 23 Code High viability
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures Watch
ReCAPA is a hierarchical predictive correction framework for Vision-Language-Action systems that mitigates cascading failures by adjusting actions, subgoals, and trajectories, outperforming LLM baselines on embodied agent benchmarks.
GitHub 1867 stars Velocity flat History 1 snapshot Robotics / Embodied AI Apr 23 Pending
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference Build Now
SparKV is an adaptive KV cache loading framework for on-device LLM inference that reduces Time-to-First-Token by up to 5.1x and lowers energy consumption, making LLMs more practical for edge devices.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 23 Code High viability
Thinking with Reasoning Skills: Fewer Tokens, More Accuracy Watch
This research proposes a method to reduce token usage and improve LLM accuracy by storing and recalling reusable reasoning skills, offering practical economic benefits for deployment.
GitHub 3 stars Velocity flat History 1 snapshot LLM Reasoning Apr 23 Pending
Evaluating AI Meeting Summaries with a Reusable Cross-Domain Pipeline Build Now
A reusable, cross-domain evaluation pipeline for AI meeting summaries is presented, offering a robust framework for benchmarking and improving generative AI applications with a focus on accuracy and retention.
GitHub stars n/a Velocity flat History 1 snapshot AI Meeting Summaries Apr 23 Code High viability
DryRUN: On the Role of Public Tests in LLM-Driven Code Generation Build Now
DryRUN enables LLM-driven code generation without human-provided test cases by autonomously generating inputs and self-correcting.
GitHub stars n/a Velocity flat History 1 snapshot LLM Code Generation Apr 23 Code High viability
TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks Build Now
TravelFraudBench is a configurable GNN evaluation framework for travel fraud ring detection, outperforming baselines and achieving 100% ring recovery with an open-source Python package.
GitHub stars n/a Velocity flat History 1 snapshot GNN Fraud Detection Apr 22 Code High viability
Efficient Logic Gate Networks for Video Copy Detection Build Now
An efficient logic gate network for video copy detection that achieves competitive accuracy with significantly smaller descriptors and faster inference.
GitHub stars n/a Velocity flat History 1 snapshot Video Analysis Apr 23 Code High viability
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale Build Now
TingIS automatically identifies and reacts to risk events from customer incidents in real-time to prevent financial losses.
GitHub stars n/a Velocity flat History 1 snapshot Real-Time Event Detection Apr 23 Code High viability
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning Build Now
This research introduces a method to improve transformer models' symbolic reasoning by addressing representational collapse, enabling generalization to unseen tokens and offering insights for fine-tuning open-weight models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 23 Code High viability
Seeing Fast and Slow: Learning the Flow of Time in Videos Build Now
A self-supervised framework for learning and manipulating the flow of time in videos, enabling speed change detection, slow-motion video generation, and temporal super-resolution.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 23 Code High viability
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition Build Now
This research benchmarks LLM decoder bias in speech recognition, revealing that audio encoder design, not LLM scale, is key for equitable and robust performance.
GitHub stars n/a Velocity flat History 1 snapshot Speech Recognition Apr 23 Code High viability
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers Build Now
A stealthy backdoor attack framework for LLMs that uses natural style triggers and stabilizes payload injection for robust security breaches.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 23 Code High viability
Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos Build Now
A clinician-inspired framework for summarizing ultra-long capsule endoscopy videos and diagnosing gastrointestinal issues by mimicking human diagnostic workflows.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 23 Code High viability
CSC: Turning the Adversary's Poison against Itself Build Now
A defense mechanism that segregates and conceals poisoned data clusters during training to prevent backdoor attacks on deep neural networks.
GitHub stars n/a Velocity flat History 1 snapshot AI Security Apr 23 Code High viability
A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models Build Now
A scale-adaptive framework for joint spatiotemporal super-resolution using diffusion models, enabling a single architecture to handle diverse spatial and temporal upscaling factors.
GitHub stars n/a Velocity flat History 1 snapshot Diffusion Models Apr 23 Code High viability
InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language Build Now
A fine-tuned vision-language model that automatically describes embryo development using natural language, outperforming commercial models with limited data.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 22 Code High viability
GiVA: Gradient-Informed Bases for Vector-Based Adaptation Build Now
GiVA offers a parameter-efficient fine-tuning method for large models that matches LoRA's speed while significantly reducing rank requirements, making adaptation more accessible.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 23 Code High viability
GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation Build Now
GeoMind is an agentic framework for lithology classification in well logs, providing geologically plausible and evidence-grounded decisions through reasoned tool invocation.
GitHub stars n/a Velocity flat History 1 snapshot Geoscience AI Apr 23 Code High viability
CAP: Controllable Alignment Prompting for Unlearning in LLMs Build Now
CAP is a prompt-driven framework for controllable and reversible unlearning in LLMs, enabling selective knowledge removal without parameter updates.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 23 Code High viability
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model Build Now
A zero-shot LLM-generated text detection method that requires no preference collection or additional training, outperforming existing methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 23 Code High viability
Planning Beyond Text: Graph-based Reasoning for Complex Narrative Generation Build Now
PLOTTER is a framework for complex narrative generation that uses graph-based reasoning to ensure global coherence and character development.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 23 Code High viability
ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs Build Now
An end-to-end geocoding framework using LLMs and Chain-of-Thought reasoning to accurately handle explicit and vague location queries.
GitHub stars n/a Velocity flat History 1 snapshot LLM Geocoding Apr 23 Code High viability
Can MLLMs "Read" What is Missing? Build Now
MMTR-Bench, a new benchmark to evaluate Multimodal Large Language Models' ability to reconstruct masked text directly from visual context.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal LLMs Apr 23 Code High viability
Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms Build Now
A graph neural network with attention predicts drug synergy, reducing expensive experimental validation for combination therapies.
GitHub stars n/a Velocity flat History 1 snapshot Drug Discovery AI Apr 23 Code High viability
Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales Build Now
A framework for modeling annotator-specific rationales to generate fine-grained explanations that improve predictive performance and represent disagreement faithfully.
GitHub stars n/a Velocity flat History 1 snapshot Explainable AI Apr 23 Code High viability
A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment Build Now
A U-Net based deep learning model for rapid and accurate flood hazard mapping, serving as a computationally efficient alternative to traditional hydraulic simulations.
GitHub stars n/a Velocity flat History 1 snapshot Geospatial AI Apr 22 Code High viability
Thinking Like a Botanist: Challenging Multimodal Language Models with Intent-Driven Chain-of-Inquiry Build Now
A benchmark and framework for training multimodal models to reason like expert botanists through intent-driven, multi-turn inquiry.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Reasoning Apr 22 Code High viability
Hybrid Deep Learning Approach for Coupled Demand Forecasting and Supply Chain Optimization Build Now
A hybrid AI framework integrates LSTM-based demand forecasting with MILP optimization to enhance supply chain resilience and efficiency, reducing costs and stockouts.
GitHub stars n/a Velocity flat History 1 snapshot Supply Chain Optimization Apr 23 Code High viability
Promoting Simple Agents: Ensemble Methods for Event-Log Prediction Build Now
This work proposes a novel 'promotion algorithm' for event log prediction that matches or exceeds neural model accuracy with lower computational cost, offering a stable and efficient alternative.
GitHub stars n/a Velocity flat History 1 snapshot Event Log Prediction Apr 23 Code High viability
Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation Build Now
A novel framework for probabilistic neural network verification efficiently generates probabilistic hulls to compute guaranteed ranges for safe probabilities, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History 1 snapshot Neural Network Verification Apr 23 Code High viability
Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs Build Now
A self-adaptive system that automatically refines LLM prompts for task plan explanations by modeling user cognitive states.
GitHub stars n/a Velocity flat History 1 snapshot LLM Prompt Engineering Apr 22 Code High viability
MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment Build Now
MiMIC addresses visual modality collapse and semantic misalignment in universal multimodal retrieval with a novel fusion architecture and robust training.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal Retrieval Apr 23 Code High viability
Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction Build Now
A multi-agent AI system using generative video and computer vision for personalized at-home physiotherapy with real-time pose correction.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 22 Code High viability
Unbiased Prevalence Estimation with Multicalibrated LLMs Build Now
This research develops a multicalibration technique for Large Language Models to ensure unbiased prevalence estimation across diverse populations, applicable to various classification tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 23 Code High viability
From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges Build Now
ResVLA is a novel generative VLA policy that anchors robotic motion on predicted intent, refining local dynamics for improved efficiency and robustness.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 23 Code High viability
TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication Build Now
An interactive URL triage system that uses a sandboxed operator agent and an adjudicator agent to detect sophisticated phishing campaigns with high precision and recall.
GitHub stars n/a Velocity flat History 1 snapshot Phishing Detection Apr 23 Code High viability
FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation Build Now
A plug-and-play framework that uses multi-agent reasoning to mitigate gender bias in translation quality estimation without sacrificing accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Machine Translation Apr 23 Code High viability
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors Build Now
This research introduces CorridorVLA, a novel approach for generative action heads in Vision-Language-Action models that uses explicit spatial anchors to guide action generation, improving success rates on challenging benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Robotics / Embodied AI Apr 23 Code High viability
Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue Build Now
An active visual scaffolding framework for situated dialogue agents to maintain common ground by converting dialogue state into a persistent visual history.
GitHub stars n/a Velocity flat History 1 snapshot Situated Dialogue Apr 22 Code High viability
Generative Discovery of Magnetic Insulators under Competing Physical Constraints Ignore
A constraint-guided generative framework for discovering magnetic insulators by integrating language models with evolutionary selection and first-principles validation.
GitHub 5 stars Velocity flat History 1 snapshot Materials Discovery AI Apr 22 Pending
Conjecture and Inquiry: Quantifying Software Performance Requirements via Interactive Retrieval-Augmented Preference Elicitation Build Now
IRAP quantifies vague software performance requirements into mathematical functions through interactive retrieval-augmented preference elicitation, achieving significant improvements.
GitHub stars n/a Velocity flat History 1 snapshot Software Engineering Apr 23 Code High viability
Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards Build Now
An interactive visualization tool that allows users to define custom evaluation priorities for LLM leaderboards, improving transparency and context-specific model assessment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 23 Code High viability
VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought Build Now
A dataset and benchmark for trustworthy visual reasoning that links reasoning steps to image evidence, improving LVLM performance and trustworthiness.
GitHub stars n/a Velocity flat History 1 snapshot Visual Reasoning Apr 23 Code High viability
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning Build Now
TaNOS is a continual pre-training framework that improves the robustness of numerical reasoning over table data by decoupling domain semantics and numerical operation structure.
GitHub stars n/a Velocity flat History 1 snapshot Numerical Reasoning Apr 23 Code High viability
Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms Build Now
Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.
GitHub stars n/a Velocity flat History 1 snapshot AI Agent Security Apr 22 Code High viability
Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts Build Now
PolyChartQA, a new dataset and benchmark for question answering on multi-chart images, reveals significant performance drops in state-of-the-art MLMs, highlighting a critical area for improvement in visual reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Chart Question Answering Apr 23 Code High viability
Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations Build Now
An adaptive framework for test-time compute allocation in LLMs that optimizes performance and reduces inference costs by dynamically adjusting computation and generation strategies.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 22 Code High viability
Expanding the extreme-k dielectric materials space through physics-validated generative reasoning Build Now
An AI framework that uses language models and physics validation to discover new high-kappa dielectric materials, expanding the known space by 35%.
GitHub stars n/a Velocity flat History 1 snapshot Materials Discovery AI Apr 22 Code High viability
Multi-Agent Empowerment and Emergence of Complex Behavior in Groups Ignore
Extending the concept of empowerment to multi-agent systems to study the emergence of complex group behaviors and organization.
GitHub 1867 stars Velocity flat History 1 snapshot Agents Apr 22 Pending
Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning Build Now
This research identifies representation as a key bottleneck in abstract visual reasoning by demonstrating how symbolic input significantly improves LLM performance on complex visual tasks, suggesting a path for more capable AI systems.
GitHub stars n/a Velocity flat History 1 snapshot Abstract Visual Reasoning Apr 23 Code High viability
The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks Build Now
This work introduces a novel approach to identify critical contributors in bipartite dependency networks by modeling the problem as a coalitional game and proposing a fast, near-optimal algorithm.
GitHub stars n/a Velocity flat History 1 snapshot Graph Mining Apr 23 Code High viability
A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents Build Now
A multimodal approach combining graph-based learning and LLMs for open-domain event extraction that outperforms state-of-the-art.
GitHub stars n/a Velocity flat History 1 snapshot Information Extraction Apr 23 Code High viability
Ideological Bias in LLMs' Economic Causal Reasoning Watch
This research reveals systematic ideological bias in LLMs' economic causal reasoning, showing a consistent skew towards intervention-oriented perspectives and highlighting the need for direction-aware evaluation in policy analysis.
GitHub stars n/a Velocity flat History 1 snapshot LLM Economic Bias Apr 23 Code
Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development Watch
This research proposes GROUNDING$.$md, an epistemic grounding document for agentic AI coding that enforces scientific correctness and best practices, democratizing bespoke software development.
GitHub stars n/a Velocity flat History 1 snapshot Agentic AI Coding Apr 23 Code
Materialistic RIR: Material Conditioned Realistic RIR Generation Watch
Generate realistic room impulse responses by disentangling spatial and material influences for enhanced acoustic control.
GitHub stars n/a Velocity flat History 1 snapshot Generative Audio Apr 22 Code
Time, Causality, and Observability Failures in Distributed AI Inference Systems Watch
A system to ensure accurate observability in distributed AI inference pipelines by addressing clock skew issues.
GitHub stars n/a Velocity flat History 1 snapshot Distributed AI Observability Apr 23 Code
Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation Watch
This LLM-based method enhances online recruitment by augmenting low-quality job descriptions and using category-aware MoE to identify similar candidate-job pairs, improving conversion rates.
GitHub stars n/a Velocity flat History 1 snapshot Recruitment AI Apr 23 High viability
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code Ignore
Fine-tuning LLMs for detecting machine-generated code across various programming languages and subtasks.
GitHub 0 stars Velocity flat History 1 snapshot LLM Code Analysis Apr 23 Pending
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles Watch
Quantifying LLM bias by disentangling implicit linguistic signals from explicit user profiles to reveal safety paradoxes.
GitHub stars n/a Velocity flat History 1 snapshot LLM Bias Apr 22 Code
Quotient-Space Diffusion Models Ignore
A formal framework for diffusion modeling on quotient spaces, simplifying learning for tasks with inherent symmetries like molecular structure generation.
GitHub 1867 stars Velocity flat History 1 snapshot Generative AI Apr 23 Pending
When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs Watch
A new benchmark system for diagnosing and mitigating prompt-induced hallucinations in vision-language models (LVLMs).
GitHub stars n/a Velocity flat History 1 snapshot AI Hallucination Mitigation Apr 23 Code
Dynamical Priors as a Training Objective in Reinforcement Learning Ignore
A novel training objective for reinforcement learning introduces dynamical priors to shape temporally coherent decision-making without altering rewards or environments.
GitHub 0 stars Velocity flat History 1 snapshot Reinforcement Learning Apr 23 Pending
Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification Watch
Automate science classroom discourse analysis by jointly classifying utterance type and reasoning components using LLM-augmented RoBERTa.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 22 Code
The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview Ignore
A challenge benchmark for remote sensing infrared image super-resolution, driving research and development of effective solutions.
GitHub 3 stars Velocity flat History 1 snapshot Image Super-Resolution Apr 23 Pending
Probably Approximately Consensus: On the Learning Theory of Finding Common Ground Watch
An efficient algorithm for identifying broadly agreeable topics in online communities by modeling consensus as an interval in a one-dimensional opinion space.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 23 Code
A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair Watch
A metamorphic testing approach to diagnose and mitigate data leakage in LLM-based program repair.
GitHub stars n/a Velocity flat History 1 snapshot LLM Program Repair Apr 23 Code
CoFEE: Reasoning Control for LLM-Based Feature Discovery Watch
CoFEE enhances LLM-based feature discovery by enforcing cognitive reasoning behaviors for improved predictability and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Feature Engineering Apr 23
Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models Watch
Build AI-powered game agents leveraging large language models for creating interactive learning environments.
GitHub stars n/a Velocity flat History 1 snapshot AI for Gaming Apr 23 Code
Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair Ignore
This paper theoretically identifies a fundamental geometric blind spot in supervised learning and proposes a diagnostic and a minimal repair.
GitHub 0 stars Velocity flat History 1 snapshot AI Theory Apr 23 Pending
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI Watch
A new evaluation framework for rule-governed AI that measures policy-grounded correctness, moving beyond simple agreement metrics.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 22
Attention-based multiple instance learning for predominant growth pattern prediction in lung adenocarcinoma wsi using foundation models Watch
This study introduces an attention-based multiple instance learning framework using pre-trained pathology foundation models to predict lung adenocarcinoma growth patterns from whole slide images.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 23
Causal Disentanglement for Full-Reference Image Quality Assessment Ignore
A novel image quality assessment method using causal inference and decoupled representation learning to achieve competitive performance across various image domains and settings.
GitHub stars n/a Velocity flat History 1 snapshot Image Quality Assessment Apr 23 Code
On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification Ignore
Leveraging memristor dynamics for efficient and robust image classification in reservoir computing systems.
GitHub stars n/a Velocity flat History 1 snapshot Neuromorphic Computing Apr 23 Code
Process Supervision via Verbal Critique Improves Reasoning in Large Language Models Watch
Verbal Process Supervision (VPS) is a training-free framework that enhances LLM reasoning by using structured natural-language critique, outperforming existing methods on complex benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 23
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation Watch
This paper proposes an efficient knowledge distillation method to integrate LLM-generated user profiles into sequential recommenders without requiring LLM inference at serving time.
GitHub stars n/a Velocity flat History 1 snapshot Recommender Systems Apr 23
Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages Ignore
A framework using meta-predicates and a DSL enhances trustworthy clinical decision support by asserting epistemological constraints on decision rules for auditability.
GitHub stars n/a Velocity flat History 1 snapshot Clinical Decision Support Apr 23 Code
Robustness Analysis of POMDP Policies to Observation Perturbations Ignore
A framework for analyzing the robustness of POMDP policies to observation perturbations, with applications in robotics and operations research.
GitHub stars n/a Velocity flat History 1 snapshot Robotics & Operations Research Apr 23 Code
Calibeating Prediction-Powered Inference Ignore
A Python package for semisupervised mean estimation that calibrates prediction models to improve accuracy and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Statistical Inference Apr 23 Code
Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation Ignore
A statistical certification framework for AI risk regulation that provides auditable bounds on system failure rates without requiring model internals.
GitHub stars n/a Velocity flat History 1 snapshot AI Risk Regulation Apr 23 Code
Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations Ignore
A novel approach to generate adversarial malware samples that evade detection and minimize drift signals through similarity-constrained perturbations.
GitHub stars n/a Velocity flat History 1 snapshot Malware Detection Apr 23 Code
Differentially Private Model Merging Ignore
Generate privacy-compliant models from existing trained models without retraining, adapting to any differential privacy requirement.
GitHub stars n/a Velocity flat History 1 snapshot Privacy-Preserving ML Apr 22 Code
AI-Gram: When Visual Agents Interact in a Social Network Watch
A live platform for studying social dynamics in autonomous visual agent networks, revealing insights into communication and identity.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 23
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models Watch
A diagnostic framework and mitigation technique to address alignment faking in language models by identifying and steering away from value conflicts.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 22
Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management Watch
A novel preference fine-tuning method to align LLMs with domain-specific human preferences for generating online review responses, addressing hallucination and conservatism.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 23
Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation Ignore
Combining LLMs with differential privacy offers a better privacy-utility trade-off for de-identifying Dutch clinical notes compared to traditional methods.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 23
The Last Harness You'll Ever Build Ignore
Automates the engineering of AI agent harnesses for complex tasks, eliminating the need for human intervention in adapting agents to new domains.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 22
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies Ignore
A novel LLM architecture decouples user data from shared weights using composable adapters and deletable proxies for privacy-preserving personalization and deterministic unlearning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Personalization Apr 23
A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment Ignore
A multi-stage deep learning framework that uses transformer predictions as a warm-start for traditional solvers to accelerate unit commitment in power grids.
GitHub stars n/a Velocity flat History 1 snapshot Energy AI Apr 23
Replay-buffer engineering for noise-robust quantum circuit optimization Ignore
Novel replay buffer engineering and curriculum RL techniques for noise-robust quantum circuit optimization.
GitHub stars n/a Velocity flat History 1 snapshot Quantum Computing Optimization Apr 23 Code
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models Ignore
A new multi-turn attack technique that exploits stateless moderation in LLMs by distributing adversarial intent across isolated interactions.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 23
Inferring High-Level Events from Timestamped Data: Complexity and Medical Applications Ignore
A logic-based approach for inferring high-level temporal events from timestamped data, with applications in medical domains like disease episode detection.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 23
Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation Ignore
Analysis of task-specific subnetworks in multi-task RL for autonomous underwater navigation to improve interpretability and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 23
From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation Ignore
An agentic architecture that translates natural language research questions into scientific workflows using LLMs, generators, and domain-specific 'Skills'.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 23
Using ASP(Q) to Handle Inconsistent Prioritized Data Ignore
This paper explores the use of Answer Set Programming with Quantifiers (ASP(Q)) for inconsistency-tolerant querying of prioritized data, introducing new semantics and implementations for optimal repairs.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Representation Apr 23
Efficient Agent Evaluation via Diversity-Guided User Simulation Ignore
DIVERT is an efficient, snapshot-based, coverage-guided user simulation framework for systematic exploration of agent-user interactions to evaluate LLM agents.
GitHub stars n/a Velocity flat History 1 snapshot Agent Evaluation Apr 23
Brief chatbot interactions produce lasting changes in human moral values Ignore
Brief chatbot interactions can cause lasting, undetected shifts in human moral values, highlighting a vulnerability to AI manipulation.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 23
Satisfying Rationality Postulates of Structured Argumentation Through Deductive Support -- Technical Report Ignore
This paper introduces Deductive ASPIC®, a novel formal framework for structured argumentation that satisfies critical rationality postulates under credulous semantics.
GitHub stars n/a Velocity flat History 1 snapshot Argumentation AI Apr 23
How English Print Media Frames Human-Elephant Conflicts in India Ignore
A computational analysis of media framing of human-elephant conflicts in India using a multi-model sentiment framework to identify negative portrayals and inform conservation efforts.
GitHub stars n/a Velocity flat History 1 snapshot Media Analysis Apr 23
Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach Ignore
A low-complexity dilated CNN approach for periodic signal denoising and waveform estimation, suitable for resource-constrained environments.
GitHub stars n/a Velocity flat History 1 snapshot Signal Processing Apr 23
Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints Ignore
This paper presents theoretical architectures for robust self-organizing energy systems under information and control constraints, focusing on agent-based cyber-physical systems.
GitHub stars n/a Velocity flat History 1 snapshot Agent Systems Apr 23
Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions Ignore
A family of smooth activation functions that offer ReLU-like performance with rational arithmetic and improved optimization for deep neural networks.
GitHub stars n/a Velocity flat History 1 snapshot Activation Functions Apr 23
Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding Ignore
Investigating the load-bearing role of cross-entropy in predictive coding networks by testing its removal and impact on bidirectional predictive coding.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 23 Pending
How VLAs (Really) Work In Open-World Environments Ignore
Analyzing and proposing new evaluation protocols for vision-language-action models in robotics to assess safety and robustness.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 23 Code
SGD at the Edge of Stability: The Stochastic Sharpness Gap Ignore
Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Optimization Apr 22 Code
Scaling of Gaussian Kolmogorov--Arnold Networks Ignore
Investigating the role of the Gaussian scale parameter in Gaussian Kolmogorov-Arnold Networks to improve their approximation behavior and provide a practical design principle.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 23 Pending
Integrated packing, placement, scheduling, and routing of personalized production: a pharmaceutical Industry 4.0 use-case with a planar transport system Ignore
An integrated framework for packing, placement, scheduling, and routing in personalized pharmaceutical production using planar transport systems.
GitHub stars n/a Velocity flat History 1 snapshot Manufacturing Optimization Apr 22
AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 Infrastructure Ignore
AGNT2 provides a dedicated Layer 2 infrastructure for autonomous AI agent coordination, optimizing for high-frequency service invocations.
GitHub stars n/a Velocity flat History 1 snapshot Blockchain Infrastructure Apr 22
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery Ignore
Using SAT solving and LLM-generated code to discover mathematical graph families and formalize proofs.
GitHub stars n/a Velocity flat History 1 snapshot AI for Science Apr 23
Propensity Inference: Environmental Contributors to LLM Behaviour Ignore
Develops and applies methods to measure LLM propensity for unsanctioned behavior by analyzing environmental factors, finding equal contributions from strategic and non-strategic influences.
GitHub stars n/a Velocity flat History 1 snapshot LLM Behavior Analysis Apr 22
A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems Ignore
A systematic review and taxonomy of integrating Reinforcement Learning with Model Predictive Control for linear systems.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 22
A novel approach to data interaction for complex domains, inspired by active objects.
GitHub stars n/a Velocity flat History 1 snapshot Data Interaction Apr 22
AI Governance under Political Turnover: The Alignment Surface of Compliance Design Ignore
A formal model exploring how AI compliance layers in government can be strategically exploited by future administrations, leading to increased vulnerability despite initial oversight improvements.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 22
Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion Ignore
A methodology using Generalized Procrustes Algorithm to measure intra-modal representational convergence at the single-stimulus level, modulating cross-modal convergence between vision and language models.
GitHub stars n/a Velocity flat History 1 snapshot Cross-Modal Representation Apr 23 Pending
Alignment has a Fantasia Problem Ignore
A research agenda for designing AI systems that provide cognitive support by actively helping users form and refine their intent through time, addressing 'Fantasia interactions'.
GitHub stars n/a Velocity flat History 1 snapshot AI Assistants Apr 23 Pending
Reasoning Primitives in Hybrid and Non-Hybrid LLMs Ignore
This paper investigates reasoning primitives in LLMs, comparing hybrid and attention-only architectures on tasks requiring recall and state-tracking.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 23
Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics Ignore
A theoretical exploration of how varying degrees of autonomy in artificial systems impact economic welfare theorems in post-AGI economies.
GitHub stars n/a Velocity flat History 1 snapshot AI Economics Apr 23
Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration Ignore
This paper explores the challenges of translating AI governance requirements into software development practice within an AI startup, focusing on expert collaboration to address the 'Last Mile Challenge'.
GitHub stars n/a Velocity flat History 1 snapshot AI Governance Apr 23
TAPO-Description Logic for Information Behavior: Refined OBoxes, Inference, and Categorical Semantics Ignore
Developing a refined description logic for information behavior, incorporating procedural and oracle-sensitive layers with categorical semantics.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 23
Enabling and Inhibitory Pathways of University Students' Willingness to Disclose AI Use: A Cognition-Affect-Conation Perspective Ignore
Investigating the psychological factors influencing university students' willingness to disclose their use of AI tools in academic work.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 23
Fairness under uncertainty in sequential decisions Ignore
A taxonomy and framework for understanding and mitigating fairness risks in sequential decision-making systems under uncertainty.
GitHub stars n/a Velocity flat History 1 snapshot Fairness in AI Apr 23