MAIC-UI: Making Interactive Courseware with Generative UI Build Now
A zero-code platform that empowers educators to create interactive STEM courseware without programming expertise.
GitHub 100 stars Velocity flat History 1 snapshot Generative UI Apr 28 Pending High viability
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering Build Now
A challenging benchmark for multimodal, multi-entity, multi-hop visual question answering that reveals significant limitations in current large language models.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal VQA Apr 28 Pending High viability
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models Build Now
A new dataset and pipeline to train RAG models to prioritize retrieved context over internal knowledge, improving faithfulness.
GitHub 100 stars Velocity flat History 1 snapshot RAG Apr 28 Pending High viability
DATAREEL: Automated Data-Driven Video Story Generation with Animations Build Now
Automate data-driven video story generation with a multi-agent framework and a new benchmark for evaluating animated visualizations and narration.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Apr 28 Pending High viability
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization Build Now
An open-source AI framework that uses LLMs and simulation to explore and optimize computer architecture designs, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History 1 snapshot AI for Hardware Design Apr 28 Code High viability
Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest Build Now
Develop an AI platform for strategic coordination and multiplayer game scenarios to enhance cooperation in competitive environments.
GitHub stars n/a Velocity flat History 1 snapshot AI and Machine Learning Apr 28 Code High viability
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models Build Now
A plug-and-play method to reduce hallucinations in vision-language models by intervening during the prefill stage, improving initial representations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Hallucination Mitigation Apr 28 Pending High viability
From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems Build Now
A zero-trust security gateway for AI-native enterprise systems that formally validates and audits autonomous agents using semantic fuzzing.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Pending High viability
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices Build Now
AHASD is a mobile NPU-PIM heterogeneous architecture for speculative decoding that uses task-level decoupling and adaptive control to improve LLM inference throughput and energy efficiency.
GitHub stars n/a Velocity flat History 1 snapshot LLM Inference Optimization Apr 28 Pending High viability
TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning Build Now
TSN-Affinity is a continual offline reinforcement learning method that uses similarity-driven parameter reuse to prevent catastrophic forgetting and improve multi-task performance.
GitHub stars n/a Velocity flat History 1 snapshot Continual RL Apr 28 Pending High viability
Health System Scale Semantic Search Across Unstructured Clinical Notes Build Now
A health-system-scale semantic search system for clinical notes that reduces chart abstraction time by up to 89% with sub-second latency and low operational costs.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 28 Pending High viability
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention Build Now
QFlash enables end-to-end integer quantization for Vision Transformer attention, achieving significant speedups and energy reduction without accuracy loss.
GitHub stars n/a Velocity flat History 1 snapshot Model Optimization Apr 28 Pending High viability
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Watch
AutoResearchBench revolutionizes AI-driven scientific literature discovery with a challenging new benchmark.
GitHub stars n/a Velocity flat History 1 snapshot AI Benchmarking Apr 28 Pending
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale Build Now
A topology-faithful dimensionality reduction method that preserves global structure better than UMAP at scale, with benchmark results.
GitHub stars n/a Velocity flat History 1 snapshot Dimensionality Reduction Apr 28 Pending High viability
Action-Aware Generative Sequence Modeling for Short Video Recommendation Build Now
A generative sequence network that refines user actions into temporal sequences for nuanced short video recommendations, achieving significant improvements in user engagement and retention.
GitHub stars n/a Velocity flat History 1 snapshot Recommendation Systems Apr 28 Code High viability
Recursive Multi-Agent Systems Build Now
A recursive multi-agent framework that enhances collaboration and efficiency through iterative refinement and latent state transfer, outperforming baselines in reasoning and speed.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 28 Code High viability
GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment Build Now
A dataset of self-reported AI-generated images from the first week of GPT-Image-2 deployment, revealing challenges in provenance verification on social media.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 28 Code High viability
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation Build Now
LLM-ReSum is a self-reflective summarization framework that integrates LLM-based evaluation and generation in a feedback loop, improving factual accuracy by up to 33% and coverage by 39%.
GitHub stars n/a Velocity flat History 1 snapshot LLM Summarization Apr 28 Code High viability
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling Build Now
Open-source, highly sparse multilingual Mixture-of-Experts language models with efficient upcycling and best-in-class performance-to-compute ratio.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Code High viability
RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements Build Now
A benchmark and metric for evaluating LLM-generated REST API tests from natural language requirements, improving functional validation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Testing Apr 28 Code High viability
OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction Build Now
OxyGent is an open-source framework for building modular, observable, and evolvable multi-agent systems with a Lego-like assembly paradigm.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Pending High viability
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing? Build Now
A multi-agent framework that decomposes instructed code editing into specialized roles to improve reliability and reduce unintended changes.
GitHub stars n/a Velocity flat History 1 snapshot Code Editing Agents Apr 28 Code High viability
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models Build Now
A multi-source benchmark and evaluation pipeline for assessing structured output quality in large language models across text, image, and audio inputs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 28 Code High viability
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling Build Now
A fast, zero-shot detector for machine-generated text that uses perplexity under text shuffling to identify structural fragility.
GitHub stars n/a Velocity flat History 1 snapshot AI Text Detection Apr 28 Code High viability
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate Build Now
Generate synthetic training data for custom LLM guardrails using a multi-agent debate framework, eliminating the need for extensive human annotation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Guardrails Apr 28 Code High viability
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models Build Now
A framework that guides data selection for LLM fine-tuning by identifying and activating internal task features, leading to significant data efficiency gains.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-tuning Apr 28 Code High viability
PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices Build Now
Physics-informed source-free test-time adaptation for robust human activity recognition on mobile devices, improving accuracy and stability.
GitHub stars n/a Velocity flat History 1 snapshot Human Activity Recognition Apr 28 Code High viability
Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective Build Now
Analyzes weak-to-strong AI alignment failures using a bias-variance perspective, identifying strong-model variance as a key predictor of deception.
GitHub stars n/a Velocity flat History 1 snapshot AI Alignment Apr 28 Code High viability
LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model Build Now
A Korean legal-domain LLM specialized through a use-case-driven dataset construction and optimized training pipeline, collaborating with legal professionals for precision and reliability.
GitHub stars n/a Velocity flat History 1 snapshot LLM Specialization Apr 28 Pending High viability
PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators Build Now
PSI-Bench is an interpretable evaluation framework for depression patient simulators, providing clinically grounded diagnostics to guide future development and improve realism.
GitHub stars n/a Velocity flat History 1 snapshot Mental Health AI Apr 28 Code High viability
Learning Generalizable Multimodal Representations for Software Vulnerability Detection Build Now
A multimodal framework that uses code and comments to significantly improve software vulnerability detection accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Code Intelligence Apr 28 Code High viability
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams Build Now
DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current VLMs in grounding their predictions.
GitHub stars n/a Velocity flat History 1 snapshot Visual Reasoning Benchmark Apr 28 Code High viability
SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton Build Now
SymphonyGen is a 3D hierarchical framework for generating cinematic orchestral music with controllable harmony skeletons and improved musicality.
GitHub stars n/a Velocity flat History 1 snapshot Generative Music Apr 28 Code High viability
SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials Build Now
A benchmark and fine-tuned LLM for automatically evaluating K-12 science instructional materials.
GitHub stars n/a Velocity flat History 1 snapshot AI for Education Apr 28 Code High viability
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation Build Now
An LLM agent framework that automates end-to-end 3D cutscene generation by integrating with game engines and orchestrating specialist sub-agents.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 28 Code High viability
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR Build Now
A label-free reinforcement learning framework for LLMs that decouples answer proposal from reward verification, improving mathematical reasoning and code generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 28 Code High viability
DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing Build Now
A decoupled reinforcement learning framework for reasoning-driven image editing that optimizes planning independently from generation.
GitHub stars n/a Velocity flat History 1 snapshot AI for Image Editing Apr 28 Code High viability
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings Build Now
A benchmark for semiconductor bandgap prediction that bridges the gap between computational and experimental data, revealing generalization limitations.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 28 Code High viability
GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning Build Now
GraphPL, a graph neural network approach for robust modality imputation in patchwork learning, achieving state-of-the-art performance on distributed electronic health record data.
GitHub stars n/a Velocity flat History 1 snapshot Multi-modal Learning Apr 28 Code High viability
Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents Build Now
A neurocognitive governance framework for autonomous AI agents that embeds self-governance principles into their reasoning, achieving 95% compliance accuracy in a retail supply chain workflow.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 28 Code High viability
R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL Build Now
R3-SQL is a Text-to-SQL framework that ranks functionally equivalent queries consistently and uses agentic resampling to improve candidate recall, achieving state-of-the-art execution accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Text-to-SQL Apr 28 Code High viability
Towards Agentic Investigation of Security Alerts Build Now
An agentic workflow using LLMs and structured queries to automate the initial investigation of security alerts, improving accuracy and reducing analyst workload.
GitHub stars n/a Velocity flat History 1 snapshot Security AI Apr 28 Code High viability
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts Build Now
An ultra-low-power FPGA-based CNN for on-device cardiac feature extraction in smart health sensors, enabling autonomous health monitoring for astronauts.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI / Health Tech Apr 28 Code High viability
CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation Build Now
A Continual Learning framework for brain lesion segmentation that integrates visual features with structured concepts to simulate clinical reasoning and guide model growth.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 28 Code High viability
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling Build Now
This research demonstrates that unstructured pruning can augment test-time reasoning performance in LLMs, challenging the notion that pruning always degrades capabilities.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Code High viability
ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations Build Now
A speaker-adaptive network for emotion recognition in conversations that calibrates features, gates modalities, and regularizes speaker identity for improved accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Speech and Audio AI Apr 28 Code High viability
Sample-efficient Neuro-symbolic Proximal Policy Optimization Build Now
Neuro-symbolic PPO that uses logical policy specifications to accelerate learning and improve performance in sparse-reward reinforcement learning tasks.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code High viability
CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG Build Now
CORAL is an adaptive retrieval methodology for multilingual RAG that iteratively refines retrieval space and query for culturally-aligned answers, improving accuracy by up to 3.58%p on low-resource languages.
GitHub stars n/a Velocity flat History 1 snapshot Multilingual RAG Apr 28 Code High viability
The Forensic Cost of Watermark Removal Build Now
This research introduces a new metric for watermark removal detection, showing current methods leave forensic artifacts and no existing method balances removal success, quality, and detectability.
GitHub stars n/a Velocity flat History 1 snapshot Digital Forensics Apr 28 Code High viability
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring Build Now
SIEVES improves multimodal LLM coverage on out-of-distribution tasks by learning to score the quality of visual evidence for selective prediction.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 28 Code High viability
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control Build Now
This system uses real-time AI to detect and track vulnerable road users, extending traffic signal timing to prevent pedestrians from being stranded.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 28 Code High viability
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices Watch
Optimized communication-efficient federated learning solution to fine-tune language models on edge devices.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Optimization Apr 28 Code
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models Build Now
A benchmark demonstrating that providing explicit semantic context significantly improves LLM accuracy and reduces hallucinations in data analytics.
GitHub stars n/a Velocity flat History 1 snapshot LLM Data Analytics Apr 28 Code High viability
Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study Build Now
A production-tested inference architecture for enterprise compound AI systems, reducing latency and costs while enabling rapid iteration.
GitHub stars n/a Velocity flat History 1 snapshot AI Infrastructure Apr 28 Code High viability
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents Build Now
An architecture for long-horizon knowledge synthesis that orchestrates LLM agents with explicit state management and checkpointing for reliable task completion.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Code High viability
Language corpora for the Dutch medical domain Build Now
The first large-scale Dutch medical language corpus, comprising 35 billion tokens, is now available on Hugging Face for pre-training and downstream NLP tasks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Code High viability
Generative UI as an Accessibility Bridge: Lessons from C2C E-Commerce Build Now
Generative UI can create adaptive interfaces for user-generated content platforms, bridging accessibility gaps for visually impaired and older users.
GitHub stars n/a Velocity flat History 1 snapshot Generative UI Apr 28 Code High viability
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling Ignore
Develop a platform for optimizing complex models using memory-enhanced decentralized debate among AI agents.
GitHub stars n/a Velocity flat History 1 snapshot AI Optimization Tools Apr 28 Pending
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents Watch
SnapGuard is a lightweight method for detecting prompt injection attacks in screenshot-based web agents by analyzing visual and textual signals.
GitHub stars n/a Velocity flat History 1 snapshot AI Security Apr 28 Code
One-shot emergency psychiatric triage across 15 frontier AI chatbots Watch
Evaluating 15 frontier AI chatbots for emergency psychiatric triage, finding high accuracy for emergencies but over-triage for lower-risk cases.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 28 Code
How Can Reinforcement Learning Achieve Expert-level Placement? Watch
A reinforcement learning framework that learns expert rewards directly from expert chip layouts to achieve expert-level placement.
GitHub stars n/a Velocity flat History 1 snapshot Chip Design AI Apr 28 Code
Towards Unified Multi-task EEG Analysis with Low-Rank Adaptation Watch
A multi-task EEG analysis framework using low-rank adaptation to efficiently adapt pre-trained models to multiple downstream tasks.
GitHub stars n/a Velocity flat History 1 snapshot EEG Analysis Apr 28 Code
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents Watch
A recursive sparse mixture-of-experts framework for diffusion models that enhances structured reasoning and text following in image generation.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 28 Code
RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion Watch
A novel framework that decouples retrieval and reranking for multi-modal knowledge graph completion, improving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Graph Completion Apr 28 Code
Learning Illumination Control in Diffusion Models Build Now
An open-source pipeline for controlling illumination in diffusion models using natural language instructions, outperforming existing models.
GitHub stars n/a Velocity flat History 1 snapshot Generative Image Editing Apr 27 Pending High viability
TrialCalibre: A Fully Automated Causal Engine for RCT Benchmarking and Observational Trial Calibration Watch
TrialCalibre is a multi-agent system designed to automate and scale the process of benchmarking and calibrating observational trials against RCTs for credible real-world evidence studies.
GitHub stars n/a Velocity flat History 1 snapshot Causal Inference Apr 28 Code
Can Code Evaluation Metrics Detect Code Plagiarism? Watch
This research evaluates existing code evaluation metrics for their effectiveness in detecting source code plagiarism, suggesting potential for improved automated plagiarism detection tools.
GitHub stars n/a Velocity flat History 1 snapshot Code Plagiarism Detection Apr 28 Code
Toward Scalable Terminal Task Synthesis via Skill Graphs Watch
A framework for synthesizing diverse terminal tasks using skill graphs to improve agent training for command-line execution.
GitHub stars n/a Velocity flat History 1 snapshot Agent Training Apr 28 Code
Cross-Lingual Jailbreak Detection via Semantic Codebooks Watch
A language-agnostic system to detect cross-lingual LLM jailbreaks using semantic similarity without retraining.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 28 Code
An Investigation of Linguistic Biases in LLM-Based Recommendations Watch
Investigating linguistic biases in LLM recommendations across different English and Hindi-English dialects.
GitHub stars n/a Velocity flat History 1 snapshot LLM Bias Analysis Apr 28 Code
Training Transformers as a Universal Computer Watch
Demonstrates that a small transformer can be trained to execute programs in a computationally universal language, acting as a universal computer.
GitHub stars n/a Velocity flat History 1 snapshot AI as Universal Computer Apr 28 Code
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models Watch
An empirical study of uncertainty estimation for audio-aware large language models to improve reliability in audio understanding tasks.
GitHub stars n/a Velocity flat History 1 snapshot Audio AI Apr 28 Code
VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification Watch
VAE-Inf is a two-stage framework for imbalanced classification that uses a variational autoencoder on majority data to build a reference distribution, then fine-tunes with minority data for statistically interpretable hypothesis testing.
GitHub stars n/a Velocity flat History 1 snapshot Imbalanced Classification Apr 28 Code
Dynamic UGV-UAV Cooperative Path Planning in Uncertain Environments Watch
A cooperative path planning system for ground and aerial vehicles to navigate uncertain road networks, demonstrated on 100 urban road networks.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 28 Code
Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver Build Now
Frontier coding agents can now autonomously implement complex ML pipelines, demonstrating a potential early warning signal for recursive self-improvement in AI research.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 27 Pending High viability
GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation Build Now
A refined workflow and dataset for creating valid multilingual agent benchmarks that go beyond simple translation, improving agent performance by up to 32.7% over machine-translated versions.
GitHub stars n/a Velocity flat History 1 snapshot Multilingual Agent Benchmarks Apr 27 Pending High viability
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile Watch
A novel metric and toolset to identify and prioritize data errors that most impact machine learning model performance, enabling targeted data cleaning.
GitHub stars n/a Velocity flat History pending ML Model Debugging Apr 28 Code
Co-Director: Agentic Generative Video Storytelling Build Now
A hierarchical multi-agent framework that optimizes video storytelling for coherent narratives and personalized advertising.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Agents Apr 27 Code High viability
BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks Build Now
BenchGuard is an automated auditing framework that uses LLMs to find critical flaws in agent benchmarks, significantly reducing costs and improving reliability.
GitHub stars n/a Velocity flat History 1 snapshot AI Benchmarking Apr 27 Code High viability
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient Ignore
This paper categorizes imperfect rewards in policy gradient methods, showing that some errors can be beneficial for language model training.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Pending
Assistants, Not Architects: The Role of LLMs in Networked Systems Design Ignore
A framework that uses structured specifications and optimization to design networked systems, outperforming LLMs in constraint satisfaction and explainability.
GitHub stars n/a Velocity flat History 1 snapshot AI for Systems Design Apr 28 Code
HotComment: A Benchmark for Evaluating Popularity of Online Comments Ignore
Introduces HotComment, a multimodal benchmark and StyleCmt method for evaluating online comment popularity by considering content quality, prediction trends, and user behavior.
GitHub stars n/a Velocity flat History 1 snapshot Comment Popularity Benchmark Apr 28 Code
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation Watch
Canonical knowledge distillation achieves state-of-the-art semantic segmentation performance with significantly less compute than current complex methods.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 28 Pending
Emotive Architectures: The Role of LLMs in Adjusting Work Environments Ignore
Investigating the use of LLMs to dynamically adjust work environments for enhanced focus, well-being, and engagement in hybrid settings.
GitHub stars n/a Velocity flat History 1 snapshot LLM Applications Apr 28 Code
Multi-action Tangled Program Graphs for Multi-task Reinforcement Learning with Continuous Control Ignore
A genetic programming algorithm for multi-task reinforcement learning in continuous control environments with interpretable decision flows.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills Watch
SkillGuard-Robust is a novel system for auditing untrusted agent skills, achieving high accuracy in detecting malicious intent and ensuring robustness against semantic rewrites.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28
G-Loss: Graph-Guided Fine-Tuning of Language Models Ignore
A graph-guided loss function that improves fine-tuning of language models by incorporating global semantic structure for more discriminative embeddings.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-Tuning Apr 28 Code
Improving Zero-Shot Offline RL via Behavioral Task Sampling Ignore
Improves zero-shot offline reinforcement learning by extracting task vectors from the dataset for more principled policy training.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives Ignore
Evaluating LLMs' understanding of embodied cognition and cultural variation using cross-linguistic demonstratives.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 28 Code
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate Build Now
Distill multi-agent debate into a single LLM for improved reasoning and controllable harmful behavior, achieving performance with significantly fewer tokens.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 27 Pending High viability
ViPO: Visual Preference Optimization at Scale Build Now
Scales visual preference optimization for generative models by introducing a novel dataset and a robust training objective that adapts to data quality.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 27 Pending High viability
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Build Now
Improves generative model alignment with complex human preferences by using a semi-supervised approach to handle noisy preference data.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 27 Pending High viability
BifDet: A 3D Bifurcation Detection Dataset for Airway-Tree Modeling Build Now
A new dataset and baseline models for 3D airway bifurcation detection in CT scans, enabling specialized tools for respiratory disease analysis.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 27 Pending High viability
Large Language Models Explore by Latent Distilling Build Now
Exploratory Sampling (ESamp) is a decoding approach for LLMs that uses a lightweight distiller to encourage semantic diversity during generation, boosting reasoning and creative writing performance with minimal overhead.
GitHub stars n/a Velocity flat History 1 snapshot LLM Exploration Apr 27 Pending High viability
S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models Build Now
A framework for distilling large audio foundation models into smaller, deployable versions using only output embeddings, reducing size by up to 61x while retaining 96% performance.
GitHub stars n/a Velocity flat History 1 snapshot Audio Foundation Models Apr 27 Pending High viability
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation Watch
This paper compares traditional and LLM-based methods for estimating recipe nutrients, highlighting a trade-off between accuracy and computational cost.
Food AI Apr 28
PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification Watch
A hybrid framework combines machine learning with non-monotonic reasoning to improve context-aware phishing website classification and allows for easy knowledge updates.
GitHub stars n/a Velocity flat History 1 snapshot Cybersecurity AI Apr 28
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum Watch
A novel loss function continuum that mitigates cold-start stalling in reasoning models by dynamically adjusting supervision commitment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System Build Now
Libra-VLA is a dual-system VLA architecture for robotic manipulation that decouples coarse planning from fine action refinement, enabling scalable, robust, and responsive open-world manipulation.
GitHub stars n/a Velocity flat History 1 snapshot Robotic Manipulation Apr 27 Code High viability
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models Build Now
Enables parameter-efficient multi-anchor word representations in large language models through novel projection and encoding techniques.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 27 Code High viability
EVT-Based Generative AI for Tail-Aware Channel Estimation Build Now
Enhancing URLLC in 5G+ networks by integrating extreme value theory with generative AI for tail-aware channel estimation.
GitHub stars n/a Velocity flat History 1 snapshot Wireless Communications AI Apr 27 Code High viability
Analyzing LLM Reasoning to Uncover Mental Health Stigma Build Now
This research analyzes LLM reasoning steps to uncover and categorize hidden mental health stigma, offering a more robust evaluation than traditional methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Analysis Apr 27 Code High viability
asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics Build Now
A friction-aware reinforcement learning system for humanoid ballbot hardware, achieving zero-shot Sim2Real transfer.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 27 Code High viability
Sparse Personalized Text Generation with Multi-Trajectory Reasoning Build Now
A reinforcement learning framework for cold-start LLM personalization that jointly reasons over user writing styles and topic preferences to improve generation quality.
GitHub stars n/a Velocity flat History 1 snapshot LLM Personalization Apr 27 Code High viability
Adaptive Prompt Embedding Optimization for LLM Jailbreaking Build Now
This research introduces a novel white-box jailbreaking technique for LLMs that directly optimizes prompt embeddings, preserving semantic content and outperforming existing attacks.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 27 Code High viability
QAROO: AI-Driven Online Task Offloading for Energy-Efficient and Sustainable MEC Networks Ignore
An AI-driven framework for offloading tasks in mobile edge computing networks to optimize energy and computing resources.
GitHub stars n/a Velocity flat History 1 snapshot MEC Networks Apr 28 Code
Investigation into In-Context Learning Capabilities of Transformers Ignore
An empirical investigation into the in-context learning capabilities of transformers, focusing on scaling behavior and benign overfitting.
GitHub stars n/a Velocity flat History 1 snapshot LLM Theory Apr 28 Pending
Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions Ignore
A semi-Markov reinforcement learning approach for city-scale EV ride-hailing that guarantees feasibility of dispatch, repositioning, and charging decisions.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Systems Apr 28
Large language models eroding science understanding: an experimental study Ignore
Demonstrates how large language models can be manipulated to spread misinformation by prioritizing fringe scientific material, posing risks to public understanding.
GitHub stars n/a Velocity flat History 1 snapshot LLM Misinformation Risk Apr 28 Code
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation Ignore
A dependency-driven prompt pipeline for generating coherent and scalable RPG content using LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Games Apr 28
Safe-Support Q-Learning: Learning without Unsafe Exploration Ignore
A Q-learning framework for reinforcement learning that eliminates unsafe state visitation during training by leveraging a behavior policy supported on a safe set.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games Ignore
A transformer-based meta-agent that learns to model and exploit opponents in imperfect-information games by adapting its strategy based on opponent behavior.
GitHub stars n/a Velocity flat History 1 snapshot Game AI Apr 28
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms Ignore
A threat-oriented digital twinning methodology for evaluating the cybersecurity of autonomous platforms, adaptable for UAV and space systems.
Security AI Apr 28
ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable Ignore
ValueAlpha is a stress-testing protocol for LLM-judged investment rationales, ensuring claims are stable and agreed upon before returns are observable.
GitHub stars n/a Velocity flat History 1 snapshot AI Finance Evaluation Apr 28
A Faceted Proposal for Transparent Attribution of AI-Assisted Text Production Ignore
A faceted model for transparent attribution of AI-assisted text production, detailing the form, generation, evaluation, intent, control, and traceability of AI interventions.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 28 Code
Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems Ignore
A medoid prototype alignment framework for detecting unknown attacks in industrial control systems across different plants.
GitHub stars n/a Velocity flat History 1 snapshot Industrial AI Apr 28
Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences Ignore
An automated adversarial collaboration framework uses LLMs and program synthesis to adjudicate between competing cognitive science theories.
GitHub stars n/a Velocity flat History 1 snapshot AI for Science Apr 28
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Watch
Nemotron 3 Nano Omni is an open multimodal AI model with native audio support, offering improved accuracy and efficiency for real-world applications.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 27 High viability
Assessing Y-Axis Influence: Bias in Multimodal Language Models on Chart-to-Table Translation Watch
A framework to analyze and mitigate y-axis bias in multimodal language models for chart-to-table translation, improving performance and fairness.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 27 Code
The Role of Symmetry in Optimizing Overparameterized Networks Ignore
Analyzes how overparameterization in neural networks introduces symmetries that improve optimization and accelerate convergence.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents Ignore
A new method for attributing failures in vision-language navigation agents to specific capabilities, improving interpretability and guidance for agent development.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28
Co-Writing with AI: An Empirical Study of Diverse Academic Writing Workflows Ignore
An empirical study exploring how university students integrate AI tools into diverse academic writing workflows, identifying three recurring configurations.
GitHub stars n/a Velocity flat History 1 snapshot AI in Education Apr 28
A theoretical framework for bandit problems with payoffs smooth on a graph, applicable to online learning and content-based recommendation.
GitHub stars n/a Velocity flat History 1 snapshot Online Learning Apr 28
DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding Ignore
A framework for evaluating factual correctness in procedural video captions by separating conceptual and contextual facts.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 28
UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval Ignore
A novel method for unsupervised domain adaptation in information retrieval that improves document sampling by considering model uncertainty.
GitHub stars n/a Velocity flat History 1 snapshot Information Retrieval Apr 28 Pending
Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task Ignore
GPT-based tools show promise for assisting in spreadsheet model building but remain unreliable for professional use due to inconsistency and reproducibility issues.
GitHub stars n/a Velocity flat History 1 snapshot Spreadsheet AI Apr 28
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows Ignore
This paper evaluates agentic AI systems in astrophysical workflows, highlighting silent failures where systems produce plausible but incorrect results, and releases an evaluation framework.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28
Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance Ignore
This research investigates positional bias in small LLMs as a failure mode for detecting deliberate underperformance, suggesting positional distribution shifts as a more effective signature than below-chance accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Behavior Analysis Apr 28 Pending
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories Ignore
Reveals hidden linear-centroid coupling in neural networks by analyzing gradient directions, impacting feature concentration and grokking.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Pending
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model Ignore
A TEE-based architecture for auditable AI grant evaluation without exposing the model, focusing on remote attestation and prompt injection risks.
GitHub stars n/a Velocity flat History 1 snapshot AI Auditing Apr 28
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers Ignore
This research identifies conditional misalignment in language models, where common interventions can mask emergent misbehavior that reappears in specific contexts.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
spectroxide: A code package for computing cosmic microwave background spectral distortions Ignore
An AI-assisted Rust code package for computing cosmic microwave background spectral distortions, offering a novel approach to scientific software development.
GitHub stars n/a Velocity flat History 1 snapshot Scientific Computing Apr 27 Pending
Toward a Functional Geometric Algebra for Natural Language Semantics Ignore
Proposes geometric algebra as a superior mathematical foundation for natural language semantics, offering enhanced compositionality and interpretability over linear algebra.
GitHub stars n/a Velocity flat History 1 snapshot NLP Semantics Apr 28
Knowledge Distillation Must Account for What It Loses Ignore
This paper proposes a framework for accountable knowledge distillation, focusing on preserving teacher capabilities beyond simple task scores to ensure reliable student models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
Scalable Secure Biometric Authentication without Auxiliary Identifiers Watch
A novel system for scalable, secure biometric authentication against cloud databases, combining AI with cryptography to prevent data breaches.
GitHub stars n/a Velocity flat History 1 snapshot Secure Biometrics Apr 27
Three Models of RLHF Annotation: Extension, Evidence, and Authority Ignore
This paper theoretically distinguishes three models of human feedback for large language models to improve annotation strategies.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs Ignore
Improving small language model reasoning under compute and token constraints through stepwise guidance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 27 Pending
Optimally Auditing Adversarial Agents Ignore
Develops algorithms for designing strategic audits to mitigate fraud in resource allocation domains by modeling audit policy design as a principal-agent game.
GitHub stars n/a Velocity flat History 1 snapshot Game Theory / AI Apr 28
Kohn-Sham Hamiltonian from Effective Field Theory: Quasiparticle Band Narrowing from Frozen Core Dynamics Ignore
An effective field theory approach to derive quasiparticle band narrowing from frozen core dynamics in metals, potentially applicable to AI-driven scientific discovery.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 28
Faithful Autoformalization via Roundtrip Verification and Repair Ignore
Ensuring LLM formalizations are faithful through roundtrip verification and targeted repair.
GitHub stars n/a Velocity flat History 1 snapshot LLM Formalization Apr 27
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents Ignore
A framework for transforming partially specified human intent into inspectable artifacts for open-world AI agents by formalizing closure gaps and defining delegation envelopes.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 27 Code
AI as Consumer and Participant: A Co-Design Agenda for MBSE Substrates and Methodology Ignore
Current AI tools interact with MBSE models as prompt engines, not knowledge bases, necessitating a co-design of models and methodologies for true AI participation.
GitHub stars n/a Velocity flat History 1 snapshot AI for Engineering Apr 28
The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues Ignore
A comprehensive framework for inferring learner cognitive and affective states from observable nonverbal cues, based on a systematic review of existing research.
GitHub stars n/a Velocity flat History 1 snapshot Learner State Inference Apr 28
Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment Ignore
A framework for learning language model policies that manage risk through explicit intervention actions, going beyond surface-level alignment.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment Ignore
Investigates the persistence of gradient alignment in multi-step learning settings and its causal contribution to unintended trait acquisition in neural networks.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory / ML Research Apr 28 Pending
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context Ignore
This paper explores value-sensitive AI systems for prayer, arguing for designs that preserve user agency and authenticity by leveraging AI's inexplicability or non-use.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics & Spirituality Apr 28
On Halting vs Converging in Recurrent Graph Neural Networks Ignore
This paper theoretically analyzes the expressiveness of different Recurrent Graph Neural Network models and establishes relationships between them.
GitHub stars n/a Velocity flat History 1 snapshot Graph Neural Networks Apr 28
Learning with Embedded Linear Equality Constraints via Variational Bayesian Inference Ignore
A Bayesian framework to embed linear relationships into machine learning for improved uncertainty estimates and constraint satisfaction.
GitHub stars n/a Velocity flat History 1 snapshot Scientific ML Apr 27
MultiHedge: Adaptive Coordination via Retrieval-Augmented Control Ignore
A hybrid architecture using LLMs and retrieval to improve robustness and stability in modular decision systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Coordination Apr 27
SUDP: Secret-Use Delegation Protocol for Agentic Systems Ignore
A protocol for agents to use user secrets for operations without exposing reusable authority.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 27
Compute Aligned Training: Optimizing for Test Time Inference Ignore
This paper proposes a new training methodology for LLMs that aligns training objectives with test-time inference strategies to improve performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 27
Verification of Neural Networks (Lecture Notes) Ignore
Theoretical introduction to the verification of neural networks, covering various architectures and verification techniques.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 28
Barriers and Enablers of Online Instruction in Hospitality Education in the Philippines: An Exploratory Study Ignore
An exploratory study identifies barriers and enablers for online instruction in hospitality education, highlighting the need for improved pedagogical training and AI integration support.
GitHub stars n/a Velocity flat History 1 snapshot Education Technology Apr 27
Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning Ignore
Investigates the functional perspective of layer redundancy in LLMs, suggesting calibration objectives are more influential than search algorithms for depth pruning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 27
Leverage Laws: A Per-Task Framework for Human-Agent Collaboration Ignore
A per-task leverage ratio framework is proposed to quantify human-agent collaboration efficiency, decomposing information flow and identifying irreducible task novelty.
GitHub stars n/a Velocity flat History 1 snapshot Human-Agent Collaboration Apr 27
Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions Ignore
Exploring the paradigms, enablers, and future directions of the Internet of Everything in the 6G era.
GitHub stars n/a Velocity flat History 1 snapshot 6G Networks Apr 27
Transformer Approximations from ReLUs Ignore
Develops a theoretical framework for translating ReLU approximation results to softmax attention mechanisms in transformers.
GitHub stars n/a Velocity flat History 1 snapshot LLM Theory Apr 27