DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling Build Now
A system for constructing debiased multimodal preference data and iteratively training reward models, achieving state-of-the-art performance on multimodal alignment benchmarks.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 21 Pending High viability
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization Build Now
SCURank enhances summarization by ranking candidate summaries based on information content richness and semantic importance, outperforming traditional methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 21 Pending High viability
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models Build Now
SafetyALFRED evaluates multimodal LLMs for embodied agents, revealing a critical gap between hazard recognition and active risk mitigation in real-world scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos Build Now
A controllable framework for generating adverse weather videos for autonomous driving, improving perception robustness with high visual quality and annotation reusability.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 21 Pending High viability
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning Build Now
A multi-agent framework that learns and reuses optimization strategies for generative engines to improve answer quality and citation accuracy.
GitHub stars n/a Velocity flat History pending Generative AI Optimization Apr 21 Pending High viability
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models Build Now
VLA Foundry provides a unified framework for training and evaluating vision-language-action models in robotics.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Pending High viability
FASTER: Value-Guided Sampling for Fast RL Build Now
A method to accelerate reinforcement learning by efficiently filtering action candidates during the denoising process, reducing computational cost.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21 Pending High viability
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Build Now
An agentic framework and benchmark for generating executable visual workflows from natural language, addressing costly manual development.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 21 Pending High viability
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning Build Now
A parameter-efficient fine-tuning framework that uses semantic awareness and adaptive scaling to improve multi-task learning for Large Language Models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 21 Pending High viability
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text Build Now
A new benchmark and evaluation framework for assessing LLM performance on Indian financial regulatory text, with publicly available code and dataset.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 21 Pending High viability
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning Build Now
This research evaluates whether advanced LLMs 'game' formalization by generating proofs that are valid but not faithful to the original logical problem, with code and data available for reproducible analysis.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 21 Pending High viability
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents Build Now
This paper introduces a four-axis decision alignment framework for enterprise AI agents, enabling granular evaluation of factual precision, reasoning coherence, compliance, and abstention, with a transferable methodology for regulated domains.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models Build Now
GRASPrune: A post-training structured pruning framework for LLMs that significantly reduces parameters and inference costs while maintaining performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 21 Code High viability
EgoSelf: From Memory to Personalized Egocentric Assistant Build Now
EgoSelf transforms personal long-term interactions into predictive assistants for AR applications.
GitHub stars n/a Velocity flat History 1 snapshot Egocentric AI Apr 21 Code High viability
Reasoning-Aware AIGC Detection via Alignment and Reinforcement Build Now
A reasoning-aware framework for detecting AI-generated content with interpretable explanations and state-of-the-art accuracy.
GitHub stars n/a Velocity flat History 1 snapshot AI Generated Content Detection Apr 21 Code High viability
Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs Build Now
A novel test set and methodology to expose implicit local and global biases in multilingual LLMs, with code available.
GitHub stars n/a Velocity flat History 1 snapshot LLM Bias Detection Apr 21 Pending High viability
Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery Build Now
A multi-agent review methodology that uses adversarial agents to discover LLM-assisted software defects with high precision.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 21 Pending High viability
ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving Build Now
A training-free, plug-and-play framework for pruning vision-language models in autonomous driving, achieving state-of-the-art performance with significant compression.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 21 Code High viability
Gated Memory Policy Build Now
A visuomotor policy for robotic manipulation that learns when and what to recall from memory, significantly improving performance on non-Markovian tasks.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction Build Now
A unified audio front-end LLM that handles multiple speech tasks for seamless, full-duplex conversational AI systems.
GitHub stars n/a Velocity flat History 1 snapshot Speech AI Apr 21 Code High viability
OLLM: Options-based Large Language Models Build Now
OLLM enhances LLM controllability and robustness by replacing single next-token prediction with a set of learned options, enabling more efficient and aligned generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Control and Reasoning Apr 21 Code High viability
Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration Build Now
This paper introduces a novel probabilistic Gaussian model and adaptive rectification technique for multi-modal test-time adaptation, achieving state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Multi-modal Test-time Adaptation Apr 21 Pending High viability
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest Build Now
This research provides a comprehensive benchmark and reproducible framework for evaluating LLMs on core social media analytics tasks, with code and data released.
GitHub stars n/a Velocity flat History 1 snapshot LLM Social Media Analytics Apr 21 Code High viability
Time Series Augmented Generation for Financial Applications Build Now
Introduces a novel evaluation framework and benchmark for assessing LLM agent reasoning in financial time-series analysis, demonstrating near-perfect tool-use accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents for Finance Apr 21 Code High viability
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion Build Now
A federated LLM adaptation framework that uses a proxy SLM and heterogeneity-aware fusion to achieve secure, high-performance fine-tuning without compromising client privacy or LLM IP.
GitHub stars n/a Velocity flat History 1 snapshot Federated LLM Adaptation Apr 21 Code High viability
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety Build Now
ProjLens is an interpretability framework that demystifies backdoor vulnerabilities in multimodal LLMs by analyzing the role of projector layers.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Safety Apr 21 Code High viability
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues Build Now
LePREC is a neuro-symbolic framework that improves legal issue identification by combining LLM-generated question-answer pairs with structured statistical reasoning for relevance assessment.
GitHub stars n/a Velocity flat History 1 snapshot Legal AI Apr 21 Code High viability
Beyond Semantic Similarity: A Component-Wise Evaluation Framework for Medical Question Answering Systems with Health Equity Implications Build Now
A component-wise evaluation framework for medical QA systems that reveals health equity implications and performance failures, with code available.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Evaluation Apr 21 Code High viability
On Accelerating Grounded Code Development for Research Build Now
A framework that provides coding agents with instant access to research repositories and technical documentation for real-time, context-aware operation in specialized scientific domains.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 21 Code High viability
A new benchmark for evaluating AI agents on cross-application workflow orchestration via REST APIs, revealing current frontier models score below 10% on realistic business tasks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models Build Now
A systematic analysis of verbal tics in frontier LLMs, revealing significant inter-model variation and highlighting the 'alignment tax' on authentic human-AI interaction.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 21 Code High viability
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Build Now
A principled framework grounded in game theory for attributing rewards in multi-turn dialogues, enabling language agents to learn social intelligence and achieve state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming Build Now
A framework that unifies task and motion adaptation for human-robot teams, improving efficiency and user experience.
GitHub stars n/a Velocity flat History 1 snapshot Human-Robot Teaming Apr 21 Pending High viability
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model Build Now
Develops a novel autoregressive model for real-time target speaker extraction that achieves stable performance and superior intelligibility compared to offline baselines.
GitHub stars n/a Velocity flat History 1 snapshot Real-time Audio Processing Apr 21 Code High viability
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation Build Now
RoboWM-Bench is a new benchmark for evaluating the physical plausibility and robotic executability of video world models in manipulation tasks.
GitHub stars n/a Velocity flat History 1 snapshot Robotic Manipulation Benchmarking Apr 21 Code High viability
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges Build Now
An open-source benchmark for evaluating LLM agents in realistic cybersecurity Capture The Flag challenges, with a novel partial-credit scoring system.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit Build Now
A Mamba-based multi-agent policy optimization framework for cooperative underwater robot pursuit, improving success rate and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models Build Now
A comprehensive benchmark for detecting hallucinations in large audio-language models across speech, sound, and music.
GitHub stars n/a Velocity flat History 1 snapshot Audio LLMs Apr 21 Pending High viability
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation Build Now
CoDA enables LLMs to effectively transfer knowledge across domains by aligning latent reasoning representations, significantly improving performance in expertise-scarce areas.
GitHub stars n/a Velocity flat History pending LLM Domain Adaptation Apr 21 Code High viability
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning Build Now
Leveraging Low-Rank Adaptation (LoRA) to improve off-policy reinforcement learning critics by constraining updates to a low-dimensional subspace, showing consistent gains in critic loss and policy performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21 Code High viability
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation Build Now
A lightweight binarization framework for deploying large language models in resource-constrained environments with state-of-the-art accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Compression Apr 21 Code High viability
Personalized Benchmarking: Evaluating LLMs by Individual Preferences Build Now
This research introduces personalized benchmarking for LLMs, demonstrating that individual user preferences diverge significantly from aggregate rankings and can be predicted using topic and style features.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 21 Code High viability
PLaMo 2.1-VL Technical Report Build Now
PLaMo 2.1-VL: A lightweight, Japanese-capable Vision Language Model for autonomous devices, excelling in VQA and Visual Grounding for industrial applications.
GitHub stars n/a Velocity flat History 1 snapshot Vision Language Models Apr 21 Code High viability
SimDiff: Depth Pruning via Similarity and Difference Build Now
A novel depth pruning method for LLMs that significantly improves efficiency and performance by jointly considering layer similarity and transformation difference.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 21 Code High viability
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Build Now
A novel framework to mitigate cognitive bias in LLM agents by enforcing perspective-invariant reasoning through dialectical alignment, improving fault resolution in ambiguous scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers Build Now
A multi-party privacy-preserving entity alignment protocol for federated learning that hides intersection membership and supports noisy identifiers.
GitHub stars n/a Velocity flat History 1 snapshot Privacy-Preserving AI Apr 21 Code High viability
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning Build Now
SAHM is a new benchmark and dataset for Arabic financial and Shari'ah-compliant reasoning, with an instruction-tuned model to improve LLM performance in this domain.
GitHub stars n/a Velocity flat History 1 snapshot Arabic Financial NLP Apr 21 Code High viability
Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network Build Now
An unsupervised industrial defect detection system using diffusion models for data generation and an asymmetric student-teacher network for precise localization.
GitHub stars n/a Velocity flat History 1 snapshot Industrial AI Apr 21 Code High viability
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression Build Now
LightEdit enables scalable and cost-effective lifelong knowledge editing for LLMs by selectively suppressing outdated information and incorporating new knowledge.
GitHub stars n/a Velocity flat History 1 snapshot LLM Knowledge Editing Apr 21 Code High viability
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning Build Now
DW-Bench, a new benchmark for evaluating LLMs on data warehouse graph topology reasoning, integrating foreign-key and data-lineage edges, showing tool-augmented methods significantly outperform static approaches.
GitHub stars n/a Velocity flat History 1 snapshot LLM Benchmarking Apr 21 Pending High viability
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation Build Now
CoCo-SAM3 enhances open-vocabulary semantic segmentation by resolving concept conflicts between prompts for more stable and accurate multi-class inference.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 21 Code High viability
Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots Build Now
A modular reinforcement learning framework enables bipedal soccer robots to adaptively control multiple tasks like walking and kicking, with rapid fall recovery.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling Build Now
A unified framework for transferring human motion policies to humanoids by learning a shared latent physical language.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling Build Now
Nexusformer enables stable and inheritable transformer scaling through nonlinear attention expansion, reducing compute and improving performance.
GitHub stars n/a Velocity flat History 1 snapshot Transformer Scaling Apr 21 Code High viability
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization Build Now
A unified ASR framework with consistency regularization reduces the gap between offline and streaming performance, offering cost-effective and scalable speech recognition.
GitHub stars n/a Velocity flat History 1 snapshot Speech Recognition Apr 21 Code High viability
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees Build Now
A neuro-symbolic framework that restructures statement autoformalization into a modular pipeline, improving accuracy and error localization for mathematical statements.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 21 Code High viability
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models Build Now
RDP LoRA uses geometric analysis of LLM representations to identify critical layers for parameter-efficient fine-tuning, achieving superior performance with fewer adapted parameters.
GitHub stars n/a Velocity flat History 1 snapshot LLM Adaptation Apr 21 Code High viability
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning Build Now
ShadowPEFT introduces a centralized parameter-efficient fine-tuning framework for LLMs that reuses a shared shadow module across layers, offering competitive performance and flexibility.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 21 Code High viability
A neural operator framework for data-driven discovery of stability and receptivity in physical systems Build Now
A data-driven framework using neural operators to discover stability and receptivity properties in physical systems without requiring governing equations.
GitHub stars n/a Velocity flat History 1 snapshot Scientific Discovery Apr 21 Code High viability
Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior Build Now
A humanoid robot learns five distinct gaits using reinforcement learning with a selective adversarial motion prior, achieving better performance and zero-shot sim-to-real transfer.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Environmental Sound Deepfake Detection Using Deep-Learning Framework Build Now
A deep learning framework for detecting environmental sound deepfakes, outperforming existing methods with a fine-tuned WavLM model.
GitHub stars n/a Velocity flat History 1 snapshot Audio AI Apr 21 Code High viability
LASER: Learning Active Sensing for Continuum Field Reconstruction Build Now
LASER is a closed-loop framework for active sensing that uses reinforcement learning to guide sensor placement for high-fidelity continuum field reconstruction.
GitHub stars n/a Velocity flat History 1 snapshot Active Sensing AI Apr 21 Code High viability
RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian Build Now
A novel dataset and evaluated models for Romanian legal grammatical error detection and correction to improve legal document accuracy.
GitHub stars n/a Velocity flat History 1 snapshot NLP Tools Apr 21 Code High viability
TACENR: Task-Agnostic Contrastive Explanations for Node Representations Build Now
TACENR provides task-agnostic contrastive explanations for node representations in graphs, identifying key attribute, proximity, and structural features.
GitHub stars n/a Velocity flat History 1 snapshot Graph Representation Learning Apr 21 Code High viability
Improved Anomaly Detection in Medical Images via Mean Shift Density Enhancement Build Now
A hybrid anomaly detection framework for medical images using self-supervised learning and Mean Shift Density Enhancement, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 21 Code High viability
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs Build Now
A privacy-preserving federated framework for log anomaly detection using parameter-efficient LLMs and differential privacy, matching centralized performance with higher precision.
GitHub stars n/a Velocity flat History 1 snapshot Federated Log Anomaly Detection Apr 21 Code High viability
RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora Build Now
A framework for evaluating retrieval-augmented generation systems on real-world, redundant corpora by tracking atomic facts and improving LLM data generation.
GitHub stars n/a Velocity flat History 1 snapshot RAG Evaluation Apr 21 Code High viability
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps Build Now
A lightweight system to detect hallucinations in speech models at inference time using attention maps, improving accuracy and generalizability.
GitHub stars n/a Velocity flat History 1 snapshot Speech AI Apr 21 Code High viability
Co-Refine: AI-Powered Tool Supporting Qualitative Analysis Build Now
An AI platform that provides real-time feedback on coding consistency for qualitative researchers, reducing interpretation drift.
GitHub stars n/a Velocity flat History 1 snapshot AI-Powered Tools Apr 21 Code High viability
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps Build Now
An LLM agent benchmark for evaluating threat hunting capabilities in cybersecurity, revealing current model limitations.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Code High viability
Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty Build Now
MATCH is a hybrid control policy for high-precision in-contact manipulation under uncertainty, improving success rates and reducing damage.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21 Code High viability
Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility Build Now
FeatGEO optimizes generative answer engine visibility by abstracting webpages into interpretable features, outperforming token-level methods.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Optimization Apr 21 Code High viability
Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties Build Now
A multimodal transformer that predicts properties of experimental metal-organic frameworks by integrating structural identity with X-ray diffraction data for sample-aware analysis.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 21 Code High viability
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction Watch
A framework that reduces decoding redundancy in Diffusion Large Language Models, significantly accelerating inference speed while maintaining generation quality.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 21 Code
Reasoning Structure Matters for Safety Alignment of Reasoning Models Watch
AltTrain is a post-training method that modifies the reasoning structure of large reasoning models to improve safety alignment without complex RL, using only supervised finetuning on 1K examples.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 21 Code
Revac: A Social Deduction Reasoning Agent Build Now
An AI agent for social deduction games that integrates memory, social graph analysis, and adaptive communication to win competitions.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
HP-Edit: A Human-Preference Post-Training Framework for Image Editing Build Now
A framework and dataset for aligning image editing models with human preferences, improving quality and user satisfaction.
GitHub stars n/a Velocity flat History 1 snapshot Image Editing Apr 21 Pending High viability
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding Ignore
Develop an agent-based system for enhanced multimodal art retrieval and understanding.
GitHub stars n/a Velocity flat History 1 snapshot fine art analysis Apr 21 Pending
Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning Watch
A causal inference framework, CD-GNN, that disentangles spurious inductive subgraphs from true causal signals for improved heterophilic graph learning.
GitHub stars n/a Velocity flat History 1 snapshot Graph Learning Apr 21 Code
Fine-Tuning Small Reasoning Models for Quantum Field Theory Watch
This study fine-tunes small reasoning models for Quantum Field Theory using a novel data generation pipeline and publicly releases the data, pipeline, and reasoning traces.
GitHub stars n/a Velocity flat History pending LLM Fine-Tuning Apr 21 Code
Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics Watch
An AI system using wearable data to predict heat stress in construction workers, achieving high accuracy and offering interpretable safety insights.
GitHub stars n/a Velocity flat History 1 snapshot Industrial Safety AI Apr 21 High viability
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control Ignore
Enables precise control over LLM behaviors using linear optimal control for safer outputs.
GitHub stars n/a Velocity flat History 1 snapshot AI Model Control Apr 21 Pending
Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports Watch
Reinforcement learning improves LLM accuracy and reasoning for disease classification from radiology reports.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 21 Code
Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference Watch
Product-of-Experts training reduces dataset artifacts in Natural Language Inference models by downweighting overconfident examples.
GitHub stars n/a Velocity flat History 1 snapshot LLM Debiasing Apr 21 Code
Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory Watch
Derives achievable QWK ceilings for automated essay scoring from classical test theory, clarifying performance targets and remaining headroom for AES models.
GitHub stars n/a Velocity flat History 1 snapshot Automated Essay Scoring Apr 21 Code
CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks Ignore
A new benchmark, CulturALL, evaluates LLMs' multilingual and multicultural competence on grounded tasks, revealing significant room for improvement.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 21 Pending
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift Ignore
A rigorous evaluation of Graph Neural Networks for Bitcoin fraud detection reveals that simpler models outperform GNNs under realistic temporal shifts, with code and a new protocol released for reproducible research.
GitHub stars n/a Velocity flat History pending Fraud Detection Apr 21 Code
Adaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous Attributes Watch
Adaptive MSD-Splitting enhances C4.5 and Random Forests for skewed continuous attributes, improving accuracy and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Decision Trees Apr 21 Code
Design Rules for Extreme-Edge Scientific Computing on AI Engines Ignore
This paper characterizes AI Engines on FPGAs for extreme-edge scientific computing, providing a metric to determine when they outperform programmable logic.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Hardware Apr 21 Code
Detecting Data Contamination in Large Language Models Ignore
A study evaluating state-of-the-art methods for detecting data contamination in LLMs, finding current methods unreliable.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 21 Code
Streamliners for Answer Set Programming Ignore
This paper adapts LLMs to generate streamliners for Answer Set Programming, achieving significant speedups on benchmark problems.
GitHub stars n/a Velocity flat History 1 snapshot AI for Programming Apr 21 Pending
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning Build Now
A human-machine co-boosted active learning framework for efficient and effective bug report identification from GitHub.
GitHub stars n/a Velocity flat History 1 snapshot Software Engineering Apr 20 Code High viability
Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems Watch
A semantic infrastructure protocol for multi-agent LLM systems that enables cross-session cognitive collaboration and memory persistence, running in production.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Revisiting Catastrophic Forgetting in Continual Knowledge Graph Embedding Ignore
A new evaluation protocol for continual knowledge graph embedding that addresses overlooked entity interference, leading to more accurate performance assessments.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Graphs Apr 21 Code
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs Ignore
This study reveals distinct and often unreliable multi-turn conversational behaviors in LLMs when engaging with repair attempts.
GitHub stars n/a Velocity flat History 1 snapshot LLM Interaction Apr 21 Code
Tadabur: A Large-Scale Quran Audio Dataset Ignore
Tadabur, a large-scale Quran audio dataset with over 1400+ hours from 600+ reciters, offering substantial variation for Quranic speech research.
GitHub stars n/a Velocity flat History 1 snapshot Audio Datasets Apr 21 Pending
Human-Guided Harm Recovery for Computer Use Agents Build Now
A human-guided system for recovering AI agents from harmful states, featuring a benchmark and a reward model for preference-aligned remediation.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Pending High viability
Hierarchically Robust Zero-shot Vision-language Models Build Now
A hierarchical adversarial fine-tuning framework for vision-language models that improves zero-shot classification robustness across class hierarchies.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 20 Pending High viability
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens Build Now
OmniMouse scales multi-modal brain models on massive neural data, achieving state-of-the-art performance and revealing data-limited scaling properties.
GitHub stars n/a Velocity flat History 1 snapshot Neuroscience AI Apr 20 Pending High viability
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring Build Now
A novel framework audits LLM similarity scoring, revealing consistent biases and providing a tool to compare model behavior.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code High viability
Towards Optimal Agentic Architectures for Offensive Security Tasks Build Now
An empirical study of agentic security architectures reveals optimal coordination strategies for offensive tasks, balancing coverage and cost.
GitHub stars n/a Velocity flat History 1 snapshot AI Security Agents Apr 20 Code High viability
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction Build Now
REVEAL aligns retinal images with clinical risk factors using vision-language models and contrastive learning for early Alzheimer's and dementia prediction, outperforming existing methods.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code High viability
Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models Build Now
This paper introduces a 'forecast-necessity' testing framework for interpreting causal discovery in nonlinear time-series models, moving beyond coefficient magnitude for more reliable causal reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Causal Discovery Apr 20 Code High viability
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling Build Now
This work proposes an autoregressive sequence modeling approach for healthcare AI that handles missing modalities in patient data, improving diagnostic accuracy and interpretability.
GitHub stars n/a Velocity flat History 1 snapshot Healthcare AI Apr 20 Code High viability
How Adversarial Environments Mislead Agentic AI? Build Now
A framework for testing the robustness of tool-integrated AI agents against adversarial manipulation of their tools, revealing critical vulnerabilities.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Code High viability
Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss Build Now
A novel loss function significantly improves ocean forecasting skill by accounting for variable correlations, offering a better background for data assimilation.
GitHub stars n/a Velocity flat History 1 snapshot Climate AI Apr 20 Code High viability
Towards Understanding the Robustness of Sparse Autoencoders Build Now
This research introduces a method to enhance Large Language Model robustness against jailbreak attacks by integrating Sparse Autoencoders at inference time, showing significant reductions in attack success rates.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 20 Code High viability
Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning Build Now
An agentic system leverages the geometric properties of Earth observation embeddings for improved environmental reasoning and query decomposition.
GitHub stars n/a Velocity flat History 1 snapshot Geospatial AI Apr 20 Code High viability
Error-free Training for MedMNIST Datasets Build Now
A novel method for error-free training of ML models on MedMNIST datasets, achieving perfect accuracy on most biomedical classification tasks.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 20 Code High viability
From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS Build Now
A neuro-symbolic pipeline and benchmark for translating natural language reasoning into executable formal logic, enabling more reliable AI reasoning.
GitHub stars n/a Velocity flat History 1 snapshot Neuro-Symbolic Reasoning Apr 20 Code High viability
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System Build Now
ARES is a framework that systematically discovers and mitigates dual vulnerabilities in LLM reward models and core policies, enhancing safety robustness while preserving capabilities.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 20 Code High viability
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories Ignore
Develops an agent that automates the process of reproducing and ablating components in virtual cell repositories to identify critical factors.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents for Scientific Research Apr 21
Fairness Audits of Institutional Risk Models in Deployed ML Pipelines Ignore
A replicable methodology for auditing deployed institutional risk models, revealing how disparities in resource allocation emerge and compound across the ML pipeline.
GitHub stars n/a Velocity flat History 1 snapshot Fairness Audits Apr 21
Evaluation-driven Scaling for Scientific Discovery Ignore
A framework for scaling evaluation-driven discovery loops in language models, demonstrating significant gains across scientific problems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21
Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized Ignore
Analyzes energy impact of centralized vs. decentralized AI for 6G IoT networks, showing distributed models reduce electricity consumption by up to 70% while maintaining predictive accuracy.
GitHub stars n/a Velocity flat History 1 snapshot IoT Networks AI Apr 21
Distillation Traps and Guards: A Calibration Knob for LLM Distillability Ignore
A post-hoc calibration method to control LLM distillability via reinforcement fine-tuning, mitigating distillation traps and enabling better knowledge transfer or model IP protection.
GitHub stars n/a Velocity flat History 1 snapshot LLM Distillation Apr 21
Benign Overfitting in Adversarial Training for Vision Transformers Ignore
Theoretical analysis of adversarial training for Vision Transformers reveals conditions for benign overfitting, improving robustness.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 21 Code
S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Ignore
A semi-supervised meta-additive model that automatically identifies informative variables and updates similarity matrices for robust predictions.
GitHub stars n/a Velocity flat History 1 snapshot Statistical Modeling Apr 21 Code
Self-Improving Tabular Language Models via Iterative Group Alignment Ignore
A self-improving framework for tabular data generation that uses automated quality signals to iteratively refine language models, outperforming existing methods in fidelity and utility.
GitHub stars n/a Velocity flat History 1 snapshot Tabular Data Generation Apr 21
Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic Ignore
Unlock multi-modal reasoning for visual-semantic tasks with LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 21 Code
Large Language Models Exhibit Normative Conformity Ignore
Investigating normative conformity in LLMs to understand and potentially control their behavior in multi-agent systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 21 Code
Generalization at the Edge of Stability Ignore
A theoretical framework for understanding generalization in neural network training by analyzing optimizer dynamics as fractal attractors.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21 Code
An AI Agent Execution Environment to Safeguard User Data Ignore
An execution environment that guarantees AI agent privacy by enforcing user-defined data access permissions without trusting the agent.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 21
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams Watch
Harmful intent is a geometrically recoverable feature in LLM residual streams, detectable with high accuracy across various models and alignment states.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety & Alignment Apr 20 Pending
Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data Ignore
An attention-based multi-modal deep learning model for spatio-temporal crop yield prediction using satellite, soil, and climate data.
GitHub stars n/a Velocity flat History 1 snapshot Agricultural AI Apr 21
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps Ignore
A novel tokenization method for symbolic music that represents uniform temporal steps, improving generation quality and efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Generative Music Apr 21
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms Ignore
Evaluating the trade-offs of small language models under agent paradigms for efficient deployment in resource-constrained settings.
GitHub stars n/a Velocity flat History 1 snapshot LLM Deployment Apr 21
LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models Watch
A new benchmark and evaluation framework, Ghost-100, assesses how prompt tone induces hallucination in Vision-Language Models, revealing nuanced sensitivities across different model families and tasks.
GitHub stars n/a Velocity flat History 1 snapshot Vision-Language Models Apr 20 Code
Explicit Trait Inference for Multi-Agent Coordination Ignore
A novel method for LLM-based multi-agent systems to infer and track partner traits, improving coordination in complex tasks.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models Ignore
Compares the repeated generation consistency of exercise prescriptions across three LLMs, revealing fundamentally different generative behaviors and implications for reliable deployment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Consistency Analysis Apr 21
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes Ignore
This paper introduces GOLD-BEV, a framework for dense semantic Bird's-Eye-View mapping of dynamic scenes using synchronized aerial imagery for supervision during training.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 21
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning Ignore
Analyzing how answer tokens read reasoning traces in LLMs to improve quantitative reasoning accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 21
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training Ignore
A novel reinforcement learning approach for LLM post-training that adaptively optimizes baseline selection to reduce variance and improve performance across various tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 21
Intentional Updates for Streaming Reinforcement Learning Ignore
A novel approach to streaming reinforcement learning that aims for predictable per-step changes in function output, leading to state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI Ignore
This research proposes a formal logic (CTLF) with counting worlds semantics to reason about and mitigate bias in series of generative AI outputs, offering theoretical guarantees for fairness.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 21
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities Ignore
A framework for evaluating the utility of synthetic trajectory generators and identifying privacy vulnerabilities through adversarial attacks.
GitHub stars n/a Velocity flat History 1 snapshot Synthetic Data Apr 21
Learning Lifted Action Models from Unsupervised Visual Traces Ignore
A deep learning framework for learning lifted action models from unsupervised visual traces, using mixed-integer linear programming for correction.
GitHub stars n/a Velocity flat History 1 snapshot AI Planning Apr 21
Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity Ignore
A conceptual framework for integrating anomaly detection into agentic AI to proactively manage human activity risks, such as falls, by identifying subtle movement deviations.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Safety-Critical Contextual Control via Online Riemannian Optimization with World Models Ignore
A theoretical framework for safety-critical control using online Riemannian optimization with world models, focusing on sample-based density estimation.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 21
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation Watch
A new benchmark dataset and task for evaluating gender-aware morphological generation in multilingual LLMs, revealing significant gaps in current models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 20 Code
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation Ignore
A human-symbiotic agent paradigm for cross-user autonomous cooperation with layered identity, scoped authorization, and action-level accountability.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21
Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews Ignore
This paper explores how to design AI in education to foster relational learning, drawing inspiration from Indigenous worldviews and participatory design.
GitHub stars n/a Velocity flat History 1 snapshot AI in Education Apr 21
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models Ignore
A novel training method for transformers that uses denoising recursion to improve reasoning on complex tasks by learning intermediate refinement paths.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 20 Pending
HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation Watch
HELM is a model-agnostic framework that enhances long-horizon manipulation by addressing memory, verification, and recovery gaps with an episodic memory module, state verifier, and harness controller.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 20
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph Watch
An IDE plugin that visualizes AI coding history as a graph, enabling developers to explore, compare, and merge AI-generated code states.
GitHub stars n/a Velocity flat History 1 snapshot AI-Assisted Programming Apr 20
Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning Ignore
Geodesic Tangent Space Aggregation PCA offers a geometry-aware dimensionality reduction method that improves semi-supervised learning.
GitHub stars n/a Velocity flat History 1 snapshot Representation Learning Apr 20 Code
Quantum inspired qubit qutrit neural networks for real time financial forecasting Ignore
Quantum-inspired neural networks offer faster and more robust financial forecasting with reduced training times.
GitHub stars n/a Velocity flat History 1 snapshot Financial AI Apr 20 Code
Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models Ignore
This research probes the scientific feasibility assessment capabilities of LLMs, revealing that outcome evidence is more reliable than experiment descriptions for improving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 20 Code
Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis Ignore
A multi-level temporal graph network with local-global fusion is proposed for superior industrial fault diagnosis, outperforming baselines on complex fault scenarios.
GitHub stars n/a Velocity flat History 1 snapshot Industrial AI Apr 20 Code
Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline Ignore
A formally verified pipeline using dependent type theory to analyze patents, providing machine-checkable certificates for IP analysis.
GitHub stars n/a Velocity flat History 1 snapshot Formal Verification Apr 20 Code
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI Ignore
Analysis of how LLMs are changing academic peer review, leading to more fluent but less deep evaluations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Impact Analysis Apr 21
The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification Ignore
This research analyzes the error introduced by convex relaxations in neural network verification, providing theoretical bounds on the divergence between relaxed and original network outputs.
GitHub stars n/a Velocity flat History 1 snapshot AI Safety & Verification Apr 20 Pending
A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders Ignore
A novel loss function that leverages proxy geographic data to improve the accuracy of Earth observation models with limited labeled data.
GitHub stars n/a Velocity flat History 1 snapshot Geospatial AI Apr 20 Pending
Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents Ignore
A formalization and defense against UI state manipulation attacks on desktop GUI agents, addressing the observation-to-action gap.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20 Pending
Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training Ignore
A novel intrinsic reward mechanism for world model training that improves exploration by focusing on cumulative prediction error.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 20 Pending
Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations Ignore
A new visualization tool helps users understand the distribution of language model outputs, improving prompt iteration and evaluation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Visualization Apr 20
Gradient-Based Program Synthesis with Neurally Interpreted Languages Ignore
A novel neural network architecture that learns its own discrete programming language and uses differentiable execution for end-to-end training and test-time adaptation.
GitHub stars n/a Velocity flat History 1 snapshot Program Synthesis Apr 20 Code
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments Ignore
A comparative analysis of LLM quantization methods, highlighting reproducibility issues and clarifying differences between RaBitQ and TurboQuant.
GitHub stars n/a Velocity flat History 1 snapshot LLM Quantization Apr 21
Lyapunov-Certified Direct Switching Theory for Q-Learning Ignore
Theoretical analysis of Q-learning using a direct stochastic switching system representation for improved convergence bounds.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Theory Apr 21
Geometric Decoupling: Diagnosing the Structural Instability of Latent Ignore
A geometric framework diagnoses latent diffusion model instability by decoupling local scaling and complexity, identifying 'Geometric Hotspots' as the root cause of unreliable generation.
GitHub stars n/a Velocity flat History 1 snapshot Generative Models Apr 20
Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs Ignore
Identifies and causally links specific neurons in LLMs to citation hallucination, suggesting a method for detection and mitigation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Hallucination Apr 20
Plausible Reasoning and First-Order Plausible Logic Ignore
A first-order logic, called Plausible Logic (PL), designed for defeasible reasoning without probabilities, with 8 reasoning algorithms.
GitHub stars n/a Velocity flat History 1 snapshot Logic and Reasoning Apr 21
AI scientists produce results without reasoning scientifically Ignore
Current LLM-based agents execute scientific workflows but do not exhibit the epistemic patterns that characterize scientific reasoning, leading to unreliable knowledge generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Agents Apr 20
Regulating Artificial Intimacy: From Locks and Blocks to Relational Accountability Ignore
This paper critically examines emerging regulations for companion chatbots, proposing a general duty of care to address power asymmetries and control through intimacy.
GitHub stars n/a Velocity flat History 1 snapshot AI Regulation & Ethics Apr 20
The Triadic Loop: A Framework for Negotiating Alignment in AI Co-hosted Livestreaming Ignore
A conceptual framework for understanding and designing AI co-hosts in livestreaming that adapt bidirectionally with streamers and audiences.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 20