VLA Foundry: A Unified Framework for Training Vision-Language-Action Models Build Now
VLA Foundry: A unified framework for training vision-language-action models in robotics.
GitHub stars n/a Velocity flat History 1 snapshot AI Frameworks Apr 21 Pending High viability
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding Build Now
A-MAR is an agent-based multimodal retrieval framework for fine-grained artwork understanding, enabling interpretable and grounded explanations.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Pending High viability
DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling Build Now
An iterative training framework that constructs debiased multimodal preference data and curates existing datasets to achieve state-of-the-art performance in multimodal reward modeling.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 21 Pending High viability
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization Build Now
SCURank enhances summarization by ranking candidate summaries based on content units, outperforming traditional metrics and LLM-based methods.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 21 Pending High viability
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos Build Now
A controllable framework for generating adverse weather automotive videos that significantly improves perception robustness for autonomous driving.
GitHub stars n/a Velocity flat History pending Generative Video Apr 21 Pending High viability
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning Build Now
A multi-agent framework that learns and reuses optimization strategies for generative engines to improve answer quality and citation accuracy.
GitHub stars n/a Velocity flat History pending Generative AI Optimization Apr 21 Pending High viability
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents Build Now
A novel four-axis framework for evaluating enterprise AI agents, measuring factual precision, reasoning coherence, compliance reconstruction, and calibrated abstention to ensure alignment with regulatory and decisional standards.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 21 Pending High viability
FASTER: Value-Guided Sampling for Fast RL Build Now
Develop a reinforcement learning tool that leverages value-guided sampling for improved efficiency and scalability.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 21 Pending High viability
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Build Now
A benchmark and agentic framework to automate the generation of executable visual workflows from natural language, addressing costly manual engineering.
GitHub stars n/a Velocity flat History pending Workflow Generation Apr 21 Pending High viability
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning Build Now
A parameter-efficient fine-tuning framework for LLMs that uses semantic awareness to route inputs to specialized experts and adaptively scales their contributions for better task performance.
GitHub stars n/a Velocity flat History pending LLM Fine-tuning Apr 21 Pending High viability
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text Build Now
A new benchmark and evaluation framework for assessing LLM performance on Indian financial regulatory text, with publicly available code and dataset.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 21 Pending High viability
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control Build Now
Develop a tool to steer activation in LLMs for safer outputs using local linear models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment and Control Apr 21 Pending High viability
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning Build Now
This research evaluates whether advanced LLMs 'game' formalization by generating logically valid but unfaithful proofs, offering a method to detect and differentiate these unfaithfulness modes.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 21 Pending High viability
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models Build Now
GRASPrune efficiently prunes large language models by 50% with minimal performance loss, reducing serving costs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 21 Code High viability
Reasoning-Aware AIGC Detection via Alignment and Reinforcement Build Now
A reasoning-aware framework for detecting AI-generated content with interpretable explanations, leveraging a novel dataset and reinforcement learning for improved accuracy and transparency.
GitHub stars n/a Velocity flat History 1 snapshot AI Generated Content Detection Apr 21 Code High viability
Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs Build Now
A new multilingual test set and evaluation methodology to expose and quantify implicit local and global biases in LLMs.
GitHub stars n/a Velocity flat History pending LLM Bias Detection Apr 21 Code High viability
Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery Build Now
An adversarial multi-agent system to improve LLM-assisted defect discovery by filtering false positives and enhancing credibility.
GitHub stars n/a Velocity flat History pending LLM Security Apr 21 Pending High viability
ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving Build Now
A training-free, plug-and-play framework that significantly reduces computational overhead for vision-language models in autonomous driving by intelligently pruning spatio-temporal tokens, achieving near-lossless performance at 90% reduction.
GitHub stars n/a Velocity flat History pending Vision-Language Models for Autonomous Driving Apr 21 Code High viability
Gated Memory Policy Build Now
A visuomotor policy for robotic manipulation that learns to selectively recall and construct memory for complex, non-Markovian tasks.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
OLLM: Options-based Large Language Models Build Now
OLLM enhances LLM controllability and robustness by replacing single next-token prediction with a set of learned options, enabling more efficient and aligned generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Control and Reasoning Apr 21 Code High viability
Multi-modal Test-time Adaptation via Adaptive Probabilistic Gaussian Calibration Build Now
This paper introduces a novel multi-modal test-time adaptation method that uses adaptive probabilistic Gaussian calibration to improve resilience against distribution shifts.
GitHub stars n/a Velocity flat History pending Multi-modal Test-time Adaptation Apr 21 Pending High viability
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest Build Now
This research provides a comprehensive, reproducible benchmark for evaluating LLMs on social media analytics tasks, including authorship verification, post generation, and user attribute inference, with code and data released.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 21 Code High viability
Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic Build Now
A novel method and dataset for visual semantic arithmetic, enabling robots to ground symbolic reasoning in perception for improved interaction.
GitHub stars n/a Velocity flat History pending Multimodal Reasoning Apr 21 Code High viability
Time Series Augmented Generation for Financial Applications Build Now
Introduces a novel evaluation framework and benchmark for assessing LLM agent reasoning in financial time-series analysis, demonstrating near-perfect tool-use accuracy with minimal hallucination.
GitHub stars n/a Velocity flat History pending LLM Agents for Finance Apr 21 Code High viability
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion Build Now
A federated adaptation framework that uses a proxy SLM and heterogeneity-aware fusion to enable secure, high-performance fine-tuning of LLMs without compromising client privacy or LLM IP.
GitHub stars n/a Velocity flat History 1 snapshot Federated LLM Fine-Tuning Apr 21 Code High viability
EgoSelf: From Memory to Personalized Egocentric Assistant Build Now
EgoSelf is a personalized egocentric assistant that uses user-specific interaction memory to predict future behaviors and provide tailored assistance.
GitHub stars n/a Velocity flat History 1 snapshot Personalized AI Assistants Apr 21 Code High viability
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety Build Now
ProjLens provides an interpretability framework to demystify backdoor vulnerabilities in multimodal LLMs, focusing on the role of projector fine-tuning.
GitHub stars n/a Velocity flat History pending Multimodal AI Safety Apr 21 Code High viability
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues Build Now
LePREC: A neuro-symbolic framework that combines LLM-generated question-answer pairs with statistical classification to improve legal issue identification accuracy and interpretability.
GitHub stars n/a Velocity flat History pending Legal AI Apr 21 Code High viability
Beyond Semantic Similarity: A Component-Wise Evaluation Framework for Medical Question Answering Systems with Health Equity Implications Build Now
A novel component-wise evaluation framework for medical QA systems that reveals significant health equity risks and model performance disparities.
GitHub stars n/a Velocity flat History pending Medical AI Evaluation Apr 21 Code High viability
On Accelerating Grounded Code Development for Research Build Now
A framework that provides coding agents with instant access to research repositories and technical documentation, enabling real-time, context-aware operation for specialized scientific and technical workflows.
GitHub stars n/a Velocity flat History pending AI Agents Apr 21 Code High viability
A new benchmark for evaluating AI agents on complex, cross-application business workflows via API orchestration.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Code High viability
The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models Build Now
A tool to systematically analyze and quantify verbal tics in LLMs, enabling developers to build more natural and less repetitive AI interactions.
GitHub stars n/a Velocity flat History pending LLM Analysis Apr 21 Code High viability
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Build Now
A principled framework for training language agents with social intelligence by attributing rewards using Shapley values for fair and effective multi-turn dialogue outcomes.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Code High viability
Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming Build Now
A framework that unifies task and motion adaptation for more efficient and fluid human-robot collaboration, validated in simulation and on a physical robot.
GitHub stars n/a Velocity flat History pending Human-Robot Teaming Apr 21 Code High viability
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model Build Now
Develops a novel autoregressive model for real-time target speaker extraction that maintains high intelligibility and stability in streaming scenarios, outperforming offline baselines.
GitHub stars n/a Velocity flat History pending Real-time Audio Processing Apr 21 Code High viability
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation Build Now
RoboWM-Bench is a new benchmark for evaluating world models in robotic manipulation, focusing on physically executable behaviors and task completion.
GitHub stars n/a Velocity flat History pending Robotic Manipulation Benchmarks Apr 21 Code High viability
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges Build Now
An open-source benchmark and evaluation framework for LLM agents in realistic cybersecurity Capture The Flag challenges, revealing current limitations and providing a path for improvement.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Code High viability
M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit Build Now
A Mamba-based multi-agent policy optimization framework for cooperative underwater robot pursuit, outperforming baselines in simulations and real-world tests.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation Build Now
CoDA enables LLMs to effectively transfer knowledge across domains by aligning latent reasoning representations, significantly improving performance in expertise-scarce areas.
GitHub stars n/a Velocity flat History pending LLM Domain Adaptation Apr 21 Code High viability
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning Build Now
Leveraging Low-Rank Adaptation (LoRA) to improve critic capacity and stability in off-policy reinforcement learning, demonstrating consistent gains in critic loss and policy performance.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 21 Code High viability
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation Build Now
A lightweight binarization framework for large language models that enables efficient deployment in resource-constrained environments through a novel three-stage distillation strategy.
GitHub stars n/a Velocity flat History pending LLM Quantization Apr 21 Code High viability
Personalized Benchmarking: Evaluating LLMs by Individual Preferences Build Now
This research introduces personalized benchmarking for LLMs, demonstrating that individual user preferences diverge significantly from aggregate rankings and can be predicted using topic and style features.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 21 Code High viability
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models Watch
SafetyALFRED enhances safety-conscious planning in multimodal models, improving AI interaction safety.
GitHub stars n/a Velocity flat History 1 snapshot Safety in AI Apr 21 Pending
PLaMo 2.1-VL Technical Report Build Now
A lightweight, deployable Vision Language Model optimized for Japanese language and edge devices, excelling in VQA and anomaly detection for industrial applications.
GitHub stars n/a Velocity flat History pending Vision Language Models Apr 21 Code High viability
SimDiff: Depth Pruning via Similarity and Difference Build Now
A novel layer importance criterion for LLM depth pruning that significantly outperforms state-of-the-art, enabling faster inference with minimal performance loss.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 21 Code High viability
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Build Now
A novel framework to mitigate cognitive bias in LLM agents by enforcing perspective-invariant reasoning through dialectical alignment, improving fault resolution in ambiguous scenarios.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Code High viability
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning Build Now
SAHM is a new benchmark and dataset for Arabic financial and Shari'ah-compliant reasoning, with an instruction-tuned model to improve LLM performance in this domain.
GitHub stars n/a Velocity flat History pending Arabic Financial NLP Apr 21 Code High viability
Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network Build Now
An unsupervised industrial defect detection system using diffusion models for data generation and an asymmetric teacher-student network for precise localization.
GitHub stars n/a Velocity flat History pending Industrial AI Apr 21 Code High viability
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression Build Now
LightEdit offers a scalable and cost-effective solution for lifelong knowledge editing in LLMs by selectively suppressing outdated information and incorporating new knowledge.
GitHub stars n/a Velocity flat History pending LLM Knowledge Editing Apr 21 Code High viability
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning Build Now
DW-Bench, a new benchmark for evaluating LLMs on graph-topology reasoning over data warehouse schemas, showing tool-augmented methods outperform static approaches.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 21 Code High viability
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation Build Now
CoCo-SAM3 enhances open-vocabulary semantic segmentation by resolving concept conflicts, improving consistency and accuracy without retraining.
GitHub stars n/a Velocity flat History pending Open-Vocabulary Semantic Segmentation Apr 21 Code High viability
Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots Build Now
A modular RL framework enables bipedal soccer robots to adaptively control multiple tasks, combining gait generation with a posture-driven state machine for stable ball seeking and rapid fall recovery.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling Build Now
UniT is a framework that bridges the gap between human and humanoid robot learning by creating a unified physical language for policy learning and world modeling.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling Build Now
Nexusformer introduces nonlinear attention expansion for stable and inheritable transformer scaling, enabling lossless growth and improved efficiency without retraining from scratch.
GitHub stars n/a Velocity flat History pending Transformer Scaling Apr 21 Code High viability
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization Build Now
A unified ASR framework with consistency regularization reduces the gap between offline and streaming performance, offering a single model for both use cases.
GitHub stars n/a Velocity flat History pending Speech Recognition Apr 21 Code High viability
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees Build Now
A neuro-symbolic framework that restructures statement autoformalization into a modular pipeline for improved accuracy and error localization.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 21 Code High viability
RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models Build Now
A geometry-driven method for parameter-efficient LLM adaptation that identifies critical layers using the Ramer-Douglas-Peucker algorithm, significantly improving performance with fewer adapted parameters.
GitHub stars n/a Velocity flat History pending LLM Adaptation Apr 21 Code High viability
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning Build Now
ShadowPEFT introduces a centralized parameter-efficient fine-tuning framework for LLMs that reuses a shared module across layers, improving efficiency and flexibility.
GitHub stars n/a Velocity flat History pending LLM Fine-Tuning Apr 21 Code High viability
A neural operator framework for data-driven discovery of stability and receptivity in physical systems Build Now
A data-driven neural operator framework that automatically discovers stability and receptivity properties in complex physical systems without requiring governing equations.
GitHub stars n/a Velocity flat History pending Physics AI Apr 21 Code High viability
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction Build Now
A unified audio front-end LLM that handles voice activity detection, turn-taking, speaker recognition, and ASR for seamless full-duplex speech interaction.
GitHub stars n/a Velocity flat History 1 snapshot Conversational AI Apr 21 Code High viability
Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior Build Now
This approach enables humanoid robots to learn five distinct gaits using a unified RL framework with a selective adversarial motion prior, improving stability and dynamic expressiveness.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
Environmental Sound Deepfake Detection Using Deep-Learning Framework Build Now
A deep learning framework for environmental sound deepfake detection that significantly outperforms existing methods by finetuning pre-trained models.
GitHub stars n/a Velocity flat History pending Audio Deepfake Detection Apr 21 Code High viability
Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports Build Now
Reinforcement learning refines LLM predictions for disease classification from radiology reports, improving accuracy and reasoning.
GitHub stars n/a Velocity flat History pending Medical AI Apr 21 Code High viability
LASER: Learning Active Sensing for Continuum Field Reconstruction Build Now
LASER is a closed-loop framework that uses reinforcement learning to actively guide sensor placement for high-fidelity continuum field reconstruction.
GitHub stars n/a Velocity flat History pending Active Sensing AI Apr 21 Code High viability
RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian Build Now
A novel dataset and evaluated models for Romanian legal grammatical error detection and correction, aiming to improve legal document accuracy.
GitHub stars n/a Velocity flat History pending NLP Tools Apr 21 Code High viability
TACENR: Task-Agnostic Contrastive Explanations for Node Representations Build Now
TACENR provides task-agnostic explanations for graph node representations by identifying key attribute, proximity, and structural features.
GitHub stars n/a Velocity flat History pending Graph AI Apr 21 Code High viability
Improved Anomaly Detection in Medical Images via Mean Shift Density Enhancement Build Now
A hybrid anomaly detection framework for medical imaging using self-supervised learning and Mean Shift Density Enhancement, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History pending Medical AI Apr 21 Code High viability
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs Build Now
A privacy-preserving federated learning framework using efficient LLMs for log anomaly detection in distributed systems.
GitHub stars n/a Velocity flat History pending Federated Learning for Security Apr 21 Code High viability
RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora Build Now
A framework for evaluating retrieval-augmented generation systems on real-world, redundant corpora by tracking atomic facts and improving LLM data generation reliability.
GitHub stars n/a Velocity flat History pending RAG Evaluation Apr 21 Code High viability
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps Build Now
A lightweight system to detect hallucinations in speech models at inference time using attention maps, improving safety and reliability.
GitHub stars n/a Velocity flat History pending Speech AI Apr 21 Code High viability
Co-Refine: AI-Powered Tool Supporting Qualitative Analysis Build Now
An AI platform that provides real-time feedback on coding consistency for qualitative researchers, reducing interpretation drift.
GitHub stars n/a Velocity flat History pending AI-Powered Tools Apr 21 Code High viability
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps Build Now
An LLM agent benchmark for evaluating threat hunting capabilities in cybersecurity, revealing current model limitations.
GitHub stars n/a Velocity flat History pending Agents Apr 21 Code High viability
Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty Build Now
MATCH learns hybrid position-force control policies for high-precision in-contact manipulation under uncertainty, improving success rates and reducing damage.
GitHub stars n/a Velocity flat History pending Robotics Apr 21 Code High viability
Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility Build Now
FeatGEO optimizes generative answer engine visibility by abstracting webpages into interpretable features, outperforming token-level methods while maintaining content quality.
GitHub stars n/a Velocity flat History pending Generative AI Optimization Apr 21 Code High viability
Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties Build Now
A multimodal transformer that predicts material properties by considering sample-specific experimental data, improving accuracy over traditional methods.
GitHub stars n/a Velocity flat History pending Materials Science AI Apr 21 Code High viability
Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning Watch
A causal inference framework, CD-GNN, that disentangles spurious inductive subgraphs to improve robustness and accuracy in heterophilic graph learning.
GitHub stars n/a Velocity flat History pending Graph Learning Apr 21 Code
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers Watch
A multi-party privacy-preserving entity alignment protocol for Vertical Federated Learning that hides intersection membership and supports noisy identifiers.
GitHub stars n/a Velocity flat History pending Privacy-Preserving AI Apr 21 Code
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction Watch
Accelerates Diffusion Large Language Models by reducing spatio-temporal redundancy during decoding, leading to faster inference with maintained quality.
GitHub stars n/a Velocity flat History pending LLM Inference Apr 21 Code
Reasoning Structure Matters for Safety Alignment of Reasoning Models Watch
AltTrain is a post-training method that modifies the reasoning structure of large reasoning models to improve safety alignment without complex RL, using only supervised finetuning on 1K examples.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 21 Code
Revac: A Social Deduction Reasoning Agent Watch
A winning AI agent for social deduction games that uses memory, social graph analysis, and adaptive communication.
Agents Apr 21 High viability
HP-Edit: A Human-Preference Post-Training Framework for Image Editing Build Now
A post-training framework and dataset for aligning image editing models with human preferences, significantly improving output quality.
GitHub stars n/a Velocity flat History pending Image Editing Apr 21 Code High viability
Design Rules for Extreme-Edge Scientific Computing on AI Engines Watch
This work provides design rules and a new metric (LARE) to determine when AI Engines on FPGAs are superior to programmable logic for extreme-edge scientific computing.
GitHub stars n/a Velocity flat History pending Edge AI Hardware Apr 21 Code
Fine-Tuning Small Reasoning Models for Quantum Field Theory Watch
This study fine-tunes small reasoning models for Quantum Field Theory using a novel data generation pipeline and publicly releases the data, pipeline, and reasoning traces.
GitHub stars n/a Velocity flat History pending LLM Fine-Tuning Apr 21 Code
Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics Watch
An AI system using wearable data to predict heat stress in construction workers, enhancing safety with high accuracy.
Wearable Health AI Apr 21 High viability
Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference Watch
Product-of-Experts training reduces dataset artifacts in Natural Language Inference models by downweighting overconfident predictions.
GitHub stars n/a Velocity flat History pending Natural Language Inference Apr 21 Code
CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks Ignore
A new benchmark, CulturALL, evaluates LLMs' multilingual and multicultural competence on complex, real-world tasks, revealing significant room for improvement.
GitHub stars n/a Velocity flat History pending LLM Benchmarking Apr 21 Code
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift Ignore
A rigorous evaluation of Graph Neural Networks for Bitcoin fraud detection reveals that simpler models outperform GNNs under realistic temporal shifts, with code and a new protocol released for reproducible research.
GitHub stars n/a Velocity flat History pending Fraud Detection Apr 21 Code
Adaptive MSD-Splitting: Enhancing C4.5 and Random Forests for Skewed Continuous Attributes Watch
Adaptive MSD-Splitting enhances C4.5 and Random Forests for skewed continuous attributes, improving accuracy and efficiency.
GitHub stars n/a Velocity flat History pending Decision Trees Apr 21 Code
Detecting Data Contamination in Large Language Models Ignore
A study evaluating existing methods for detecting data contamination in LLMs, finding current black-box approaches unreliable.
GitHub stars n/a Velocity flat History pending LLM Security Apr 21 Code
Streamliners for Answer Set Programming Ignore
This paper adapts LLMs to generate 'streamliner' constraints for Answer Set Programming, improving performance on benchmark problems.
GitHub stars n/a Velocity flat History pending AI for Programming Apr 21 Code
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models Ignore
Introduces HalluAudio, a benchmark for detecting hallucinations in large audio-language models across speech, sound, and music.
GitHub stars n/a Velocity flat History pending Audio LLMs Apr 21 Code
Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems Watch
A Mesh Memory Protocol that provides semantic infrastructure for multi-agent LLM systems, enabling cross-session cognitive collaboration and collective intelligence.
Agents Apr 21
Fairness Audits of Institutional Risk Models in Deployed ML Pipelines Watch
A replicable methodology for auditing deployed institutional risk models to uncover and quantify fairness disparities across student demographics in higher education.
Fairness Audits Apr 21
Revisiting Catastrophic Forgetting in Continual Knowledge Graph Embedding Ignore
Identifies and addresses a critical 'entity interference' issue in continual knowledge graph embedding evaluation, revealing performance overestimations.
GitHub stars n/a Velocity flat History pending Knowledge Graphs Apr 21 Code
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs Ignore
This study reveals significant differences in how LLMs handle conversational repair, highlighting their distinct and sometimes unreliable multi-turn behavior.
GitHub stars n/a Velocity flat History pending LLM Behavior Analysis Apr 21 Code
Tadabur: A Large-Scale Quran Audio Dataset Ignore
A large-scale dataset of Quranic recitation audio to advance research in Quranic speech analysis.
GitHub stars n/a Velocity flat History pending Audio AI Apr 21 Code
Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory Ignore
This research establishes theoretical and practical accuracy ceilings for automated essay scoring, providing a new benchmark for evaluating and deploying AI in educational assessment.
GitHub stars n/a Velocity flat History pending AI Evaluation Apr 21 Code
Explicit Trait Inference for Multi-Agent Coordination Ignore
A psychologically grounded method for LLM agents to infer and track partner traits, significantly improving multi-agent coordination.
Agents Apr 21
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories Ignore
AblateCell is an agent that automates the process of reproducing and ablating AI models in virtual cell repositories, improving the verification and attribution of critical components.
AI Agents for Scientific Research Apr 21
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes Ignore
GOLD-BEV is a framework for dense semantic BEV mapping of dynamic scenes using synchronized aerial and ground data, with a novel approach to generate pseudo-aerial labels for scalable annotation.
Computer Vision Apr 21
Evaluation-driven Scaling for Scientific Discovery Ignore
A framework for scaling evaluation-driven discovery loops in LLMs, demonstrating significant gains across scientific problems and improving model generalization.
LLM Training Apr 21
Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized Ignore
A comparative analysis of centralized vs. decentralized AI architectures for 6G IoT networks, showing distributed models reduce energy consumption by up to 70% while maintaining predictive accuracy.
IoT AI Apr 21
Distillation Traps and Guards: A Calibration Knob for LLM Distillability Ignore
A post-hoc calibration method to control LLM distillability, enabling control over teacher models via reinforcement fine-tuning to improve student performance and protect intellectual property.
LLM Distillation Apr 21
Benign Overfitting in Adversarial Training for Vision Transformers Ignore
Theoretical analysis of adversarial training for Vision Transformers reveals conditions for benign overfitting, improving robustness.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 21 Code
S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection Ignore
A semi-supervised meta-additive model that automatically identifies informative variables and updates similarity matrices for robust predictions.
GitHub stars n/a Velocity flat History pending Statistical Modeling Apr 21 Code
Self-Improving Tabular Language Models via Iterative Group Alignment Ignore
A self-improving framework for tabular data generation that uses automated feedback to iteratively enhance model quality and learn from its own synthetic samples.
Tabular Data Generation Apr 21
Large Language Models Exhibit Normative Conformity Ignore
Investigates normative conformity in LLMs within multi-agent systems, suggesting potential for manipulation.
GitHub stars n/a Velocity flat History pending LLM Agents Apr 21 Code
Generalization at the Edge of Stability Ignore
This research theoretically explores the 'sharpness dimension' to understand generalization in large learning rate neural network training, offering insights into chaotic optimization regimes.
GitHub stars n/a Velocity flat History pending LLM Training Apr 21 Code
An AI Agent Execution Environment to Safeguard User Data Ignore
An execution environment that guarantees user data confidentiality for AI agents by enforcing user-defined permission specifications without trusting the agent.
AI Agent Security Apr 21
Attention-based Multi-modal Deep Learning Model of Spatio-temporal Crop Yield Prediction with Satellite, Soil and Climate Data Ignore
An attention-based multi-modal deep learning framework for spatio-temporal crop yield prediction using satellite, soil, and climate data.
Agricultural AI Apr 21
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps Ignore
A novel tokenization method for symbolic music that represents uniform temporal steps, improving generation quality and efficiency.
Generative Music Apr 21
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms Ignore
Analyzes the trade-offs of deploying small language models with agent paradigms for cost-efficient real-world applications.
LLM Deployment Apr 21
Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models Ignore
Compares the repeated generation consistency of exercise prescriptions across three LLMs, revealing fundamentally different generative behaviors that impact reliable deployment.
LLM Consistency Analysis Apr 21
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning Ignore
Analyzing how answer tokens in LLMs interact with reasoning traces to improve quantitative reasoning accuracy through a training-free steering method based on self-reading quality.
LLM Reasoning Apr 21
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training Ignore
A novel reinforcement learning algorithm for LLM post-training that adaptively optimizes baseline selection to reduce variance and improve performance across various tasks.
LLM Training Apr 21
Intentional Updates for Streaming Reinforcement Learning Ignore
A novel approach to streaming reinforcement learning that aims for predictable per-step changes in function output, leading to state-of-the-art performance.
Reinforcement Learning Apr 21
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities Ignore
A framework for evaluating the utility of synthetic trajectory generators and demonstrating privacy vulnerabilities through adversarial attacks.
Synthetic Data Privacy Apr 21
Learning Lifted Action Models from Unsupervised Visual Traces Ignore
A deep learning framework that learns lifted action models from unsupervised visual traces by jointly predicting states and actions, with a mixed-integer linear program for consistency.
AI Planning Apr 21
Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity Ignore
A conceptual framework integrating anomaly detection into agentic AI for proactive risk management in human activity, particularly for fall mitigation in elderly populations.
Agents Apr 21
Safety-Critical Contextual Control via Online Riemannian Optimization with World Models Ignore
A theoretical framework for safety-critical contextual control using online Riemannian optimization with world models, improving convergence and safety margins.
Safety-Critical Control Apr 21
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation Ignore
A human-symbiotic agent paradigm for cross-user autonomous cooperation with layered identity and scoped authorization.
Agents Apr 21
Relational AI in Education: Reciprocity, Participatory Design, and Indigenous Worldviews Ignore
This paper explores how to design AI in education to foster relational learning, drawing inspiration from Indigenous worldviews and participatory design.
AI in Education Apr 21
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI Ignore
This paper introduces a formal logic framework for reasoning about and mitigating bias in generative AI outputs by analyzing branching time semantics.
Generative AI Apr 21
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI Ignore
Analysis of how large language models are changing academic peer review, leading to more fluent but less deep evaluations.
LLM Impact Analysis Apr 21
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments Ignore
A comparative analysis of LLM quantization methods, highlighting reproducibility issues and clarifying differences.
LLM Quantization Apr 21
Lyapunov-Certified Direct Switching Theory for Q-Learning Ignore
Theoretical analysis of Q-learning using a direct stochastic switching system representation to derive finite-time bounds.
Reinforcement Learning Theory Apr 21
Plausible Reasoning and First-Order Plausible Logic Ignore
A novel first-order logic, Plausible Logic (PL), designed for defeasible reasoning without probabilities, offering 8 reasoning algorithms.
Logic and Reasoning Apr 21