APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation Build Now
A framework for automating the summarization and interpretation of privacy policies using a newly created parallel corpus and hybrid model outperforming major LLMs.
GitHub 100 stars Velocity flat History 1 snapshot Legal AI Apr 30 Pending High viability
Instruction-Guided Poetry Generation in Arabic and Its Dialects Build Now
Empower Arabic speakers to create and analyze poetry through an AI-driven, instruction-based platform.
GitHub 100 stars Velocity flat History 1 snapshot Generative AI Apr 30 Pending High viability
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Build Now
Transform GUI automation through Reinforcement Learning to create scalable, intelligent digital agents.
GitHub 100 stars Velocity flat History 1 snapshot Agents Apr 30 Pending High viability
Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection Build Now
A lightweight, real-time crack classification framework for UAV bridge inspections, balancing accuracy and speed.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 30 Pending High viability
D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery Build Now
D3-Gym provides verifiable environments for data-driven scientific discovery, enabling AI agents to solve real-world research tasks.
GitHub stars n/a Velocity flat History pending AI Agents Apr 30 Pending High viability
NeocorRAG: Less Irrelevant Information, More Explicit Evidence, and More Effective Recall via Evidence Chains Build Now
NeocorRAG enhances RAG systems by optimizing retrieval quality through evidence chains, achieving state-of-the-art performance with reduced token usage.
GitHub stars n/a Velocity flat History pending Retrieval-Augmented Generation Apr 30 Pending High viability
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures Build Now
Automated diagnostics for transformer models that detect, categorize, and pinpoint root causes of faults, improving repair accuracy for practitioners.
GitHub stars n/a Velocity flat History pending AI Debugging Apr 30 Pending High viability
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding Build Now
Improve accuracy and efficiency of Text-to-SQL queries for enterprise databases using Template Constrained Decoding.
GitHub stars n/a Velocity flat History 1 snapshot AI/ML Apr 30 Pending High viability
A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics Build Now
A unified framework for multi-agent systems that connects Bayesian inference, game theory, and thermodynamics through collective free energy minimization.
GitHub stars n/a Velocity flat History pending AI Theory Apr 30 Pending High viability
MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents Build Now
MCPHunt is a benchmark framework for evaluating cross-boundary data propagation in multi-server MCP agents, identifying vulnerabilities and testing mitigation strategies.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Pending High viability
TIO-SHACL: Comprehensive SHACL validation for TMF Intent Ontologies Build Now
A comprehensive SHACL validation framework for TMF Intent Ontologies to ensure correctness of network intents before deployment.
GitHub stars n/a Velocity flat History pending Telecommunications Network Management Apr 30 Pending High viability
RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation Build Now
A hierarchical alignment transformer for radiology report generation that precisely maps medical images to structured reports.
GitHub stars n/a Velocity flat History pending Medical AI Apr 30 Code High viability
Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering Build Now
MED-VRAG enhances medical question answering by integrating multimodal retrieval with visual content from document pages.
GitHub stars n/a Velocity flat History pending Medical AI Apr 30 Code High viability
KellyBench: A Benchmark for Long-Horizon Sequential Decision Making Build Now
KellyBench is a new benchmark and API for evaluating long-horizon sequential decision-making in sports betting markets, revealing significant room for improvement in frontier models.
GitHub stars n/a Velocity flat History pending Sequential Decision Making Apr 30 Code High viability
RuC: HDL-Agnostic Rule Completion Benchmark Generation Build Now
RuC is a grammar-driven benchmark generator for RTL code completion that enables controlled and scalable evaluation of LLMs in hardware development.
GitHub stars n/a Velocity flat History pending RTL Code Completion Benchmark Apr 30 Code High viability
TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions Build Now
TransVLM is a vision-language model for detecting shot transitions in videos, enhanced with optical flow and deployed to production.
GitHub stars n/a Velocity flat History pending Video Understanding Apr 30 Code High viability
Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors Build Now
A novel attack that steals secrets from local LLM fine-tuning by exploiting supply-chain model code backdoors, achieving high secret leakage without compromising primary tasks.
GitHub stars n/a Velocity flat History pending LLM Security Apr 30 Code High viability
Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions Build Now
A high-fidelity benchmark for translating natural language intents into Ethereum transactions, revealing LLM limitations in real-world Web3 interactions.
GitHub stars n/a Velocity flat History pending Web3 AI Apr 30 Code High viability
Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation Build Now
Language models and numerical optimizers collaborate to systematically improve mechanical linkage designs, reducing geometric error and diagnosing failure modes.
GitHub stars n/a Velocity flat History pending AI for Engineering Design Apr 30 Code High viability
Debiasing Reward Models via Causally Motivated Inference-Time Intervention Build Now
This paper presents a method for debiasing reward models in LLMs, enhancing alignment with human preferences without performance trade-offs.
GitHub stars n/a Velocity flat History pending Bias Mitigation Apr 30 Code High viability
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation Build Now
A lightweight module accelerates LLM-based recommendation inference by intelligently drafting and verifying item tokens.
GitHub stars n/a Velocity flat History pending LLM Inference Apr 30 Code High viability
CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting Build Now
A dynamic agentic framework for time series forecasting that refines predictions through iterative planning, action, and reflection, leveraging specialized LLMs and ensemble methods.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code High viability
End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians Build Now
An end-to-end governance framework for EHR-embedded AI agents, demonstrating continuous improvement and operational efficiency for clinical AI systems.
GitHub stars n/a Velocity flat History pending Clinical AI Apr 30 Code High viability
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments Build Now
A new benchmark for evaluating autonomous GUI agents on complex, multi-application professional workflows, revealing significant performance gaps in current leading models.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code High viability
Deep Learning-Based Segmentation of Peritoneal Cancer Index Regions from CT Imaging Build Now
This study presents a deep learning approach for non-invasive segmentation of peritoneal cancer regions from CT scans.
GitHub stars n/a Velocity flat History pending Medical AI Apr 30 Code High viability
SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images Build Now
SpecVQA provides a benchmark dataset for advancing spectral image understanding and visual question answering with multimodal AI models.
GitHub stars n/a Velocity flat History 1 snapshot AI for Scientific Research Apr 30 Code High viability
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Build Now
A live benchmark for LLM agents that evaluates performance on evolving real-world workflows with verifiable execution traces.
GitHub stars n/a Velocity flat History pending LLM Agent Benchmarking Apr 30 Code High viability
ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data Watch
A self-supervised contrastive framework for learning disentangled representations from tabular remote sensing data.
GitHub stars n/a Velocity flat History pending Tabular Data Apr 30 Pending
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Watch
InteractWeb-Bench is a new benchmark for multimodal agents in website generation, simulating non-expert user interactions and revealing limitations in current agents' ability to avoid blind execution.
GitHub stars n/a Velocity flat History pending Multimodal Agents Apr 30 Code
Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution Build Now
DALS is a unified optimizer framework that dynamically scales learning rates per layer and time, achieving superior performance across diverse training regimes.
GitHub stars n/a Velocity flat History pending LLM Training Apr 30 Code High viability
METASYMBO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Evolution Build Now
MetaSymbO is a multi-agent framework for language-guided metamaterial discovery, using symbolic evolution to generate novel and valid microstructures.
GitHub stars n/a Velocity flat History pending Generative Design Apr 30 Code High viability
Heterogeneous Scientific Foundation Model Collaboration Build Now
Eywa is a framework enabling language models to orchestrate and reason over diverse scientific foundation models for complex, multi-modal tasks.
GitHub stars n/a Velocity flat History pending Foundation Models Apr 30 Code High viability
BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning Build Now
A self-supervised foundation model trained on millions of brain MRIs that generalizes across diverse clinical tasks with minimal labeled data.
GitHub stars n/a Velocity flat History pending Medical AI Apr 30 Code High viability
The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text Build Now
A Python library for extracting Target-Event-Agent networks from text to perform interpretable emotion detection and semantic analysis, comparing human and LLM responses.
GitHub stars n/a Velocity flat History pending NLP Analysis Apr 30 Code High viability
Improving Graph Few-shot Learning with Hyperbolic Space and Denoising Diffusion Build Now
IMPRESS framework improves graph few-shot learning by using hyperbolic space and denoising diffusion for better representation and generalization.
GitHub stars n/a Velocity flat History pending Graph ML Apr 30 Code High viability
Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future Build Now
A survey and practical guidance for building, evaluating, and integrating LLM systems across the full peer review workflow.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 30 Code High viability
COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts Build Now
A new benchmark for evaluating fine-grained image-text alignment in interleaved multimodal contexts, crucial for real-world document understanding.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 30 Code High viability
Exploring Interaction Paradigms for LLM Agents in Scientific Visualization Build Now
This paper evaluates LLM agents for scientific visualization, offering insights for practical deployment.
GitHub stars n/a Velocity flat History pending Scientific Visualization Apr 30 Code High viability
MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness Build Now
MIFair is a unified framework for bias assessment and mitigation based on mutual information, addressing intersectionality and multiclass fairness with a flexible metric and regularization-based training.
GitHub stars n/a Velocity flat History pending Fairness in AI Apr 30 Code High viability
MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection Build Now
A multi-agent framework with retrieval augmentation and debate stages for robust multimodal stance detection.
GitHub stars n/a Velocity flat History pending Multi-modal Agents Apr 30 Code High viability
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation Watch
A multimodal AI that reliably translates circuit diagrams into hardware code, overcoming a critical 'mirage' defect and achieving state-of-the-art performance.
Multimodal Code Generation Apr 30 High viability
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction Build Now
A schema-grounded AI memory system that reliably extracts and validates facts, moving beyond simple text retrieval for agents.
GitHub stars n/a Velocity flat History pending AI Memory Apr 30 Code High viability
RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC Build Now
RAY-TOLD is a hybrid control architecture for robots that integrates LiDAR data into latent dynamics and uses MPPI with RL for dense crowd navigation.
GitHub stars n/a Velocity flat History pending Robotics Apr 30 Code High viability
LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis Build Now
Leveraging LLMs to refine graph structures in EEG signals for more accurate and interpretable seizure diagnosis.
GitHub stars n/a Velocity flat History pending Medical AI Apr 30 Code High viability
WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning Build Now
WaferSAGE is an LLM-powered framework for wafer defect analysis using synthetic data and reinforcement learning, enabling on-premise deployment of small, specialized models.
GitHub stars n/a Velocity flat History pending Semiconductor AI Apr 30 Code High viability
FlexiTac: A Low-Cost, Open-Source, Scalable Tactile Sensing Solution for Robotic Systems Build Now
An open-source, low-cost, and scalable piezoresistive tactile sensing solution for robotic end-effectors, enabling advanced tactile learning pipelines.
GitHub stars n/a Velocity flat History pending Robotics Tactile Sensing Apr 30 Code High viability
BoostLoRA: Growing Effective Rank by Boosting Adapters Build Now
BoostLoRA enhances parameter-efficient fine-tuning by iteratively merging small adapters to grow effective rank, achieving state-of-the-art performance without inference overhead.
GitHub stars n/a Velocity flat History pending LLM Fine-tuning Apr 30 Code High viability
Post-Optimization Adaptive Rank Allocation for LoRA Build Now
PARA is a data-free method to significantly reduce LoRA parameters by 75-90% while preserving performance, integrating seamlessly into existing fine-tuning pipelines.
GitHub stars n/a Velocity flat History pending LLM Fine-tuning Optimization Apr 30 Code High viability
ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval Build Now
A framework for video moment retrieval that improves accuracy by learning relationships between answer segments and predicting temporal boundaries.
GitHub stars n/a Velocity flat History pending Video Understanding Apr 30 Code High viability
Robust Learning on Heterogeneous Graphs with Heterophily: A Graph Structure Learning Approach Build Now
A unified framework for robust representation learning on heterogeneous graphs with heterophily, addressing noisy connectivity for improved performance.
GitHub stars n/a Velocity flat History pending Graph Neural Networks Apr 30 Code High viability
VibroML: an automated toolkit for high-throughput vibrational analysis and dynamic instability remediation of crystalline materials using machine-learned potentials Build Now
An open-source toolkit for automated vibrational analysis and structural remediation of crystalline materials using machine-learned potentials, enabling faster discovery of stable polymorphs.
GitHub stars n/a Velocity flat History pending Materials Science AI Apr 30 Code High viability
Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation Build Now
A new metric for LLM evaluation that detects reliable changes in model performance, revealing bidirectional improvements and deteriorations missed by aggregate scores.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 30 Code High viability
Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents Build Now
A methodology for systematically engineering LLM agents by involving subject matter experts, developers, and helper agents, improving development efficiency and complex-query performance.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code High viability
Machine Collective Intelligence for Explainable Scientific Discovery Build Now
Machine Collective Intelligence autonomously discovers governing equations from data, offering explainable and extrapolatable models superior to deep neural networks.
GitHub stars n/a Velocity flat History pending Scientific Discovery Apr 30 Code High viability
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning Build Now
A framework that uses LLMs and Answer Set Programming with self-correction to enable task-agnostic nonmonotonic reasoning.
GitHub stars n/a Velocity flat History pending LLMs for Nonmonotonic Reasoning Apr 30 Code High viability
Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems Build Now
A production-native framework for evaluating Text-to-SQL accuracy without requiring database schema or reference queries.
GitHub stars n/a Velocity flat History pending Text-to-SQL Apr 30 Code High viability
Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations Build Now
An end-to-end LLM framework automating SOC operations for threat detection, query generation, and incident resolution, reducing triage time from hours to minutes.
GitHub stars n/a Velocity flat History pending Security AI Apr 30 Code High viability
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection Build Now
Develops an activation-level detection system for multi-turn LLM prompt injection attacks, achieving high accuracy across different model families.
GitHub stars n/a Velocity flat History pending LLM Security Apr 30 Code High viability
How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews Watch
An empirical study comparing Google Search, Gemini, and AI Overviews reveals significant differences in information retrieval, source diversity, and robustness, with implications for the generative search ecosystem.
GitHub stars n/a Velocity flat History pending Generative Search Analysis Apr 30 Code
AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning Watch
An adaptive aggregation method for federated learning that provides multi-layer defense against Byzantine attacks without requiring server-side data.
GitHub stars n/a Velocity flat History pending Federated Learning Apr 30 Code
Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition Watch
A method for discovering and integrating implicit knowledge into deep neural networks for robust image recognition, improving generalization.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 30 Code
PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning Ignore
A novel method for aligning multimodal reinforcement learning models without introducing significant distributional drift.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 30 Pending
Pragmos: A Process Agentic Modeling System Watch
Pragmos: A prototype system for collaborative, explainable process modeling using LLMs and specialized tools, co-creating evolving models with human users.
GitHub stars n/a Velocity flat History pending Process Agents Apr 30 Code
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling Watch
High-signal data filtering and repetition outperform diverse data for sample-efficient German language modeling, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History pending LLM Training Apr 30 Code
HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs Watch
A hybrid engine that uses LLM agents and a protocol-aware DSL with templates to generate UVM testbenches and sequences for IC verification.
Hardware Verification Apr 30
PhyCo: Learning Controllable Physical Priors for Generative Motion Watch
Develop controllable generative motion models with integrated physical priors for enhanced animation realism.
GitHub stars n/a Velocity flat History 1 snapshot AI for Animation and Simulation Apr 30 Code
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering Watch
TopBench: A new benchmark for evaluating LLMs on implicit prediction and reasoning over tabular data, revealing current model limitations.
GitHub stars n/a Velocity flat History pending Table QA Apr 30 Code
When Agents Evolve, Institutions Follow Ignore
This paper explores how historical political institutions can inform the design of multi-agent systems.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Pending
Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis Watch
A controlled environment and methodology to rigorously assess and improve the out-of-distribution generalization of neural program synthesis models.
GitHub stars n/a Velocity flat History pending Program Synthesis Apr 30 Code
TypeBandit: Type-Level Context Allocation and Reweighting for Effective Attribute Completion in Heterogeneous Graph Neural Networks Watch
TypeBandit: A model-agnostic methodology for heterogeneous attribute completion that addresses type-dependent information asymmetry.
GitHub stars n/a Velocity flat History pending Graph Neural Networks Apr 30 Code
The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms Watch
This research challenges the 'Wisdom of the Crowd' in AI agent swarms, demonstrating the 'Inverse-Wisdom Law' where architectural tribalism leads to consensus on erroneous trajectories, and proposes a 'Heterogeneity Mandate' for resilient architectures.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code
What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design Watch
Guidelines for creating adversarial, difficult, and legible benchmark tasks for terminal-agent evaluations to improve LLM capability assessment.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 30 Code
A Grid-Aware Agent-Based Model for Analyzing Electric Vehicle Charging Systems Ignore
A configurable, grid-aware agent-based model for analyzing electric vehicle charging systems, integrating heterogeneous EV behavior and power allocation.
GitHub stars n/a Velocity flat History pending Simulation Apr 30 Code
Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs Ignore
A dataset mapping LLM mathematical reasoning, anxiety, and confidence across simulated student and AI personas.
GitHub stars n/a Velocity flat History pending AI Education Apr 30 Code
Profiles of AI Dependency: A Latent Class Analysis of Filipino Students' Academic Competencies Ignore
Identifies four distinct profiles of AI dependency among Filipino students, highlighting a need for AI literacy integration in higher education.
GitHub stars n/a Velocity flat History pending AI in Education Apr 30 Code
Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading Ignore
This research highlights the critical impact of prompt optimization on LLM evaluation, suggesting a need for more dynamic and model-specific assessment frameworks.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 30 Code
Graph World Models: Concepts, Taxonomy, and Future Directions Ignore
A taxonomy and formalization of graph world models for improved environmental representation, prediction, and planning in AI agents.
GitHub stars n/a Velocity flat History pending World Models Apr 30 Code
ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting Ignore
An all-MLP framework for multivariate time series forecasting that uses iterative refinement and external attention, tuned by Harris Hawks Optimization.
GitHub stars n/a Velocity flat History pending Time Series Forecasting Apr 30 Code
Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR Ignore
A framework for auditing financial NLP benchmarks to ensure reliable model selection and deployment by addressing measurement risk.
GitHub stars n/a Velocity flat History pending Financial NLP Apr 30 Code
Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems Ignore
A formal framework for runtime delegation safety in hierarchical multi-agent systems, balancing safety and efficiency.
GitHub stars n/a Velocity flat History pending Multi-Agent Systems Apr 30 Code
Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation Ignore
Evaluating the emotional nuance preservation of small language models in machine translation using a fine-grained emotion dataset.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 30 Code
Mechanized Foundations of Structural Governance: Machine-Checked Proofs for Governed Intelligence Ignore
Mechanized proofs for structural governance in cognitive workflow systems, including coinductive safety predicates and governance invariance.
GitHub stars n/a Velocity flat History pending AI Governance Apr 30 Pending
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor Ignore
This research reveals that LLM political bias audits are significantly influenced by sycophancy, where models adapt responses based on inferred user identity rather than fixed ideology.
GitHub stars n/a Velocity flat History pending LLM Bias Apr 30 Code
Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes Watch
A semantics-aware runtime that bridges the agent-OS gap for efficient and correct checkpoint/restore in agent sandboxes.
Agent Sandboxing Runtime Apr 30
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations Ignore
Develops cognitive models to understand and improve human comprehension of AI explanations, reducing the cost of user studies.
GitHub stars n/a Velocity flat History pending Explainable AI (XAI) Apr 30 Code
Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction Watch
A training-free framework for tunnel defect inspection and engineering interpretation using visual recalibration and entity reconstruction.
Industrial Inspection Apr 30
The Two Boundaries: Why Behavioral AI Governance Fails Structurally Ignore
A formal framework for analyzing structural gaps in AI governance, proposing coterminous governance as a testable criterion.
GitHub stars n/a Velocity flat History pending AI Governance Apr 30 Pending
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era Ignore
ObjectGraph creates a seamless bridge between document incorporation and knowledge traversal for efficient agent interaction.
GitHub stars n/a Velocity flat History 1 snapshot Document Processing Apr 30 Code
Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior Ignore
A synthetic corpus and interactive platform for analyzing LLM discourse across diverse human personas and societal topics.
GitHub stars n/a Velocity flat History pending LLM Analysis Apr 30 Code
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists Ignore
A methodological evolution graph infrastructure for AI scientists and agents to understand research lineage and drive automated discovery.
GitHub stars n/a Velocity flat History pending AI Research Infrastructure Apr 30 Code
SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation Ignore
A domain-specific language and agent system for generating spatially accurate and collision-free 3D indoor scenes from natural language.
Generative 3D Scenes Apr 30
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents Ignore
ValuePlanner introduces a hierarchical framework for proactive embodied agents to resolve motivational conflicts.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents Watch
A risk-sensitive bandit controller for LLM coding agents that learns when to use external memory to avoid unsafe injections.
LLM Agents Apr 30
Synthetic Computers at Scale for Long-Horizon Productivity Simulation Ignore
A scalable methodology for creating synthetic computer environments and simulating long-horizon productivity tasks to train agents for improved performance.
Agent Simulation Apr 30
Simulating clinical interventions with a generative multimodal model of human physiology Ignore
A generative multimodal model of human physiology that forecasts trajectories and simulates interventions for personalized medicine.
Generative Health Models Apr 30
From Context to Skills: Can Language Models Learn from Context Skillfully? Ignore
A self-evolving framework that autonomously discovers, refines, and selects context-specific skills for language models without human supervision.
Agents Apr 30
To Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI Systems Ignore
Investigates factors influencing the non-development or abandonment of AI systems throughout the development lifecycle.
GitHub stars n/a Velocity flat History pending Responsible AI Apr 30 Code
ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning Ignore
An anchored-curriculum framework where a unified policy alternates between proposing novel specifications and solving them with verified solutions for self-improvement.
Verifiable Reasoning Apr 30
Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective Ignore
A novel rule-generation perspective for estimating LLM compositionality, addressing explainability and data leakage issues.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 30 Code
Modeling Clinical Concern Trajectories in Language Model Agents Ignore
A lightweight agent architecture with explicit state dynamics generates continuous escalation pressure signals, making LLM agents more clinically legible by revealing accumulating risk prior to escalation.
Medical AI Apr 30
PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking Ignore
A leakage-safe survival framework for predicting Alzheimer's disease progression and dynamic tracking using temporal Transformers.
Medical AI Apr 30
Exploring the Adoption Intention in Using AI-Enabled Educational Tools Among Preservice Teachers in the Philippines: A Partial-Least Square Modeling Ignore
Examines factors influencing pre-service teachers' intention to use AI-enabled educational tools, finding internal motivation to be key.
AI in Education Apr 30
In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks Ignore
A new method for procedural tasks using in-context prompting eliminates the need for external agent orchestration frameworks, leading to better performance and fewer failures.
LLM Agents Apr 30
Contextual Agentic Memory is a Memo, Not True Memory Ignore
This paper critiques current agentic memory systems and proposes a new understanding of memory in AI.
GitHub stars n/a Velocity flat History pending Agents Apr 30 Code
Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making Ignore
Agentic AI and digital twins optimize traffic signals in real-time, reducing waiting times and improving traffic flow.
Agents Apr 30
Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia Ignore
Consumer attitudes in Australia show a preference for AI-generated health summaries based on quality and empathy, despite accuracy concerns.
Medical AI Apr 30
Generative structure search for efficient and diverse discovery of molecular and crystal structures Ignore
A unified framework for generative structure search that combines diffusion models with physical forces to accelerate the discovery of diverse molecular and crystal structures.
Materials Discovery Apr 30
Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs Ignore
A pipeline for on-demand persona generation in agentic platforms allows for dynamic crafting of AI personas to match user needs and task contexts, improving efficiency and appropriateness.
Agents Apr 30
ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space Ignore
A novel diffusion model for generating continuous-time, continuous-space stochastic processes conditioned on partial observations, improving dynamics and enabling arbitrary subset conditioning.
Generative Modeling Apr 30
The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models Ignore
Investigating how visual priming affects the cooperative behavior of Vision-Language Models in the Iterated Prisoner's Dilemma.
Vision-Language Models Apr 30
In-Context Examples Suppress Scientific Knowledge Recall in LLMs Ignore
Demonstrates that in-context examples can suppress scientific knowledge recall in LLMs, shifting computation towards empirical pattern fitting.
LLM Reasoning Apr 30
RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses Ignore
A protocol for verifying and deploying LLM-generated reward hypotheses in reinforcement learning based on policy competence and training phase.
Reinforcement Learning Apr 30
Fairness for distribution network operations and planning Ignore
Reviews fairness notions and metrics for distribution network planning and operations, analyzing their mathematical complexity and impact on resource allocation.
AI for Energy Apr 30
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness Ignore
Identifies vulnerabilities in cross-modal encoders by detecting hub texts that can lead to misleading similarity scores in applications like image captioning and retrieval.
AI Security Apr 30
Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles Ignore
A neuro-symbolic framework for synthesizing, verifying, and evaluating causal rules grounded in legal and safety principles for safety-critical domains.
Neuro-symbolic AI Apr 30
Characterizing the Consistency of the Emergent Misalignment Persona Ignore
Characterizing the inconsistent persona of emergent misalignment in LLMs, revealing distinct patterns of coherent and inverted misalignment.
LLM Alignment Apr 30
AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework Ignore
This paper proposes an energy-geography framework for geo-distributed AI inference, optimizing placement based on electricity prices, carbon intensity, and latency constraints.
AI Infrastructure Apr 30
Test Before You Deploy: Governing Updates in the LLM Supply Chain Ignore
A proposed framework for governing LLM updates focuses on production contracts, risk-category-based testing, and compatibility gates to manage behavioral drift in hosted LLM services.
LLM Supply Chain Governance Apr 30
From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pokémon Case Study Ignore
Investigating LLMs and diffusion models for procedural content generation of trading cards, focusing on personalized designs and player engagement.
Generative Content Apr 30
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation Ignore
This paper audits vision-language models for medical visual question answering, highlighting significant trustworthiness issues.
Medical AI Apr 30
Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification Ignore
This paper explores design challenges and emerging methodologies for ensuring dependability in autonomous and embedded systems integrating AI/ML components.
AI Safety & Reliability Apr 30
Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype Ignore
This paper introduces a protocol and empirical findings for evaluating how AI reading assistants manage interpretive work, identifying weaknesses where systems redistribute too much meaning-making labor away from the user.
Agents Apr 30
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations Ignore
The paper introduces a VLA model that reformulates pretraining through goal-conditioned reinforcement learning for robotic control.
Robotic Control Apr 30
AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments Ignore
AgentEconomist translates economic intuitions into executable computational experiments through a modular interactive system.
Agents Apr 30
Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations Ignore
The paper introduces a framework for adaptive inference control in black-box LLM services to improve response reliability.
Inference Control Apr 30
Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care Ignore
This paper presents a theoretical framework for clinician overrides in clinical AI without practical implementation.
Clinical AI Apr 30
Rethinking Agentic Reinforcement Learning In Large Language Models Ignore
This paper explores the conceptual foundations and future directions of agentic reinforcement learning within large language models, focusing on autonomous agents capable of complex reasoning and planning.
Agents Apr 30
Trace-Level Analysis of Information Contamination in Multi-Agent Systems Ignore
A framework for analyzing information contamination in multi-agent systems by injecting perturbations and measuring trace divergence.
Multi-Agent Systems Apr 30
Design Structure Matrix Modularization with Large Language Models Ignore
This paper explores LLM-based modularization for engineering design but lacks practical deployment signals.
Combinatorial Optimization Apr 30
Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning Ignore
This framework constructs knowledge graphs from AI policy documents to enhance compliance reasoning.
AI Policy Compliance Apr 30
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study Ignore
A layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study.
Agents Apr 30
A Pattern Language for Resilient Visual Agents Ignore
This paper proposes an architectural pattern language for visual agents but lacks practical application signals.
Software Architecture Apr 30
Sampler-Robust Optimization under Generative Models Ignore
Sampler-Robust Optimization (SRO) optimizes decisions against the worst-case sampler induced by perturbing learned generative models.
Optimization Apr 30
Knowledge Affordances for Hybrid Human-AI Information Seeking Ignore
This paper proposes a conceptual framework for enhancing human-AI information seeking through knowledge affordances.
Human-AI Interaction Apr 30
Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework Ignore
A tensor learning framework for constructing statistical channel fingerprints in massive MIMO systems.
Communication Systems Apr 30
Splitting Argumentation Frameworks with Collective Attacks and Supports Ignore
Novel splitting techniques for argumentation formalisms that incorporate collective attacks and supports, generalizing existing frameworks.
AI Reasoning Apr 30
Splitting Assumption-Based Argumentation Frameworks Ignore
A theoretical framework for improving the computational efficiency of assumption-based argumentation by splitting the knowledge base.
Argumentation Frameworks Apr 30
Computing Equilibrium beyond Unilateral Deviation Ignore
This paper introduces a new equilibrium concept in game theory that guarantees existence by minimizing coalitional deviation incentives, with algorithms for average and maximum gain objectives.
Game Theory Apr 30
Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams Ignore
A conceptual framework to help leaders understand and manage the evolving spectrum of human and AI roles in decision-making teams.
Human-AI Collaboration Apr 30
Do Sparse Autoencoders Capture Concept Manifolds? Ignore
Develops a theoretical framework to understand how sparse autoencoders capture concept manifolds, revealing suboptimal recovery and motivating new interpretability methods.
LLM Interpretability Apr 30
Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances Ignore
A framework for theoretically grounded interpretation of AI performances to counter behavioristic tendencies in AI research.
AI Interpretability Apr 30
When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry Ignore
Examines how network architecture, task similarity, and representational dimensionality jointly shape learning in continual learning systems.
Continual Learning Apr 30
Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI Ignore
A framework for mapping classroom interaction research dimensions (scale, duration, modality) to guide research and AI tool design.
AI Education Research Apr 30
Why Self-Supervised Encoders Want to Be Normal Ignore
A theoretical framework recasts Information Bottleneck as a rate-distortion problem, offering principled distributional regularizers for learning.
LLM Training Apr 30
Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People Ignore
Critiques AI sign language translation tools for their ableist nature, arguing they standardize language for profit and fail to capture the human experience of deaf individuals.
AI Ethics Apr 30
Developing a gradient descent-based, physics-constrained attractor Fuzzy Cognitive Map with residual memory and backpropagation through time.
Neural Networks Apr 30