MAIC-UI: Making Interactive Courseware with Generative UI Build Now
A zero-code platform that empowers educators to create interactive STEM courseware without programming expertise.
GitHub 100 stars Velocity flat History 1 snapshot Generative UI Apr 28 Pending High viability
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering Build Now
A new benchmark for testing multimodal AI models on complex multi-hop reasoning across texts and visuals.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal VQA Apr 28 Pending High viability
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models Build Now
Faithfulness-QA enhances retrieval-augmented language models by providing a large dataset for context-faithful output training.
GitHub 100 stars Velocity flat History 1 snapshot RAG Apr 28 Pending High viability
DATAREEL: Automated Data-Driven Video Story Generation with Animations Build Now
Automate data-driven video story generation with a multi-agent framework and a new benchmark for evaluation.
GitHub stars n/a Velocity flat History pending Generative Video Apr 28 Pending High viability
Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest Build Now
Develop an AI platform for strategic coordination and multiplayer game scenarios to enhance cooperation in competitive environments.
GitHub stars n/a Velocity flat History 1 snapshot AI and Machine Learning Apr 28 Code High viability
Action-Aware Generative Sequence Modeling for Short Video Recommendation Build Now
AI-driven short video recommendation system that captures user preferences to enhance engagement.
GitHub stars n/a Velocity flat History 1 snapshot Video Recommendation Apr 28 Code High viability
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models Build Now
A plug-and-play method to reduce hallucinations in vision-language models by intervening during the prefill stage, improving initial representations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Hallucination Mitigation Apr 28 Pending High viability
From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems Build Now
A zero-trust semantic gateway for secure, validated autonomous agent interaction in enterprise systems.
GitHub stars n/a Velocity flat History pending Agents Apr 28 Pending High viability
TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning Build Now
TSN-Affinity is a continual offline reinforcement learning method that enables task-specific parameterization and knowledge sharing through similarity-guided architectural reuse.
GitHub stars n/a Velocity flat History pending Continual RL Apr 28 Pending High viability
Health System Scale Semantic Search Across Unstructured Clinical Notes Build Now
A health-system-scale semantic search system for clinical notes that reduces chart abstraction time by up to 89% at a low operational cost.
GitHub stars n/a Velocity flat History pending Medical AI Apr 28 Pending High viability
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention Build Now
QFlash enables end-to-end integer-only FlashAttention for Vision Transformers, significantly speeding up computation and reducing energy consumption.
GitHub stars n/a Velocity flat History pending Vision Transformer Optimization Apr 28 Pending High viability
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Watch
AutoResearchBench revolutionizes AI-driven scientific literature discovery with a challenging new benchmark.
GitHub stars n/a Velocity flat History 1 snapshot AI Benchmarking Apr 28 Pending
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale Build Now
Achieve topology-faithful dimensionality reduction at scale, outperforming UMAP in preserving global structure.
GitHub stars n/a Velocity flat History pending Dimensionality Reduction Apr 28 Pending High viability
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization Build Now
An agentic AI framework that uses LLMs and simulation to explore and optimize computer architecture designs, outperforming state-of-the-art.
GitHub stars n/a Velocity flat History pending AI for Hardware Design Apr 28 Code High viability
PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices Build Now
Physics-informed test-time adaptation for robust human activity recognition on mobile devices, enabling on-device personalization without centralizing private data.
GitHub stars n/a Velocity flat History pending Human Activity Recognition Apr 28 Code High viability
Recursive Multi-Agent Systems Build Now
A recursive multi-agent framework that enhances reasoning and efficiency in complex tasks by enabling agent collaboration through a unified latent-space computation.
GitHub stars n/a Velocity flat History pending Multi-Agent Systems Apr 28 Code High viability
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices Watch
AHASD is a novel asynchronous heterogeneous architecture for LLM speculative decoding on mobile devices, significantly improving throughput and energy efficiency.
GitHub stars n/a Velocity flat History pending LLM Inference Optimization Apr 28 Pending
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation Build Now
LLM-ReSum is a self-reflective summarization framework that improves factual accuracy by up to 33% and coverage by 39% through an LLM-based feedback loop.
GitHub stars n/a Velocity flat History 1 snapshot LLM Summarization Apr 28 Code High viability
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling Build Now
Open-source, highly sparse multilingual Mixture-of-Experts language models with efficient upcycling and best-in-class performance-to-compute ratio.
GitHub stars n/a Velocity flat History pending LLM Training Apr 28 Code High viability
RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements Build Now
A benchmark and metric for evaluating LLM-generated REST API tests from natural language requirements, improving functional validation.
GitHub stars n/a Velocity flat History pending LLM Testing Apr 28 Code High viability
OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction Build Now
OxyGent is an open-source framework for building modular, observable, and evolvable multi-agent systems using a Lego-like abstraction.
GitHub stars n/a Velocity flat History pending Agents Apr 28 Code High viability
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing? Build Now
A multi-agent framework that decomposes instructed code editing into specialized roles to improve reliability and reduce unintended changes.
GitHub stars n/a Velocity flat History pending Code Editing Apr 28 Code High viability
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models Build Now
A multi-source benchmark and evaluation pipeline for assessing structured output quality in large language models across text, image, and audio inputs.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 28 Code High viability
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling Build Now
A fast, zero-shot detector for machine-generated text that leverages perplexity shifts under text shuffling, outperforming existing methods.
GitHub stars n/a Velocity flat History 1 snapshot AI Content Detection Apr 28 Code High viability
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate Build Now
Generate synthetic training data for custom LLM guardrails using debate and dimension decomposition, eliminating the need for human annotation.
GitHub stars n/a Velocity flat History pending LLM Guardrails Apr 28 Code High viability
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models Build Now
Interpretability-Guided Data Selection (IGDS) framework that leverages internal LLM task features to select highly effective data for fine-tuning, improving efficiency and performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 28 Code High viability
LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model Build Now
A Korean legal-domain LLM trained with a use-case-driven framework and rigorous data curation for practical legal tasks.
GitHub stars n/a Velocity flat History pending LLM Specialization Apr 28 Code High viability
PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators Build Now
PSI-Bench is an interpretable evaluation framework for depression patient simulators, providing clinically grounded diagnostics to improve realism and guide future development.
GitHub stars n/a Velocity flat History pending Mental Health AI Apr 28 Code High viability
Learning Generalizable Multimodal Representations for Software Vulnerability Detection Build Now
A multimodal framework that leverages code and comments to significantly improve software vulnerability detection accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Software Vulnerability Detection Apr 28 Code High viability
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams Build Now
DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, enabling more reliable assessment of VLM capabilities in diagram question answering.
GitHub stars n/a Velocity flat History pending Visual Reasoning Benchmark Apr 28 Code High viability
SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton Build Now
SymphonyGen is a 3D hierarchical framework for generating cinematic orchestral music with controllable harmony skeletons and improved musicality.
GitHub stars n/a Velocity flat History pending Generative Music Apr 28 Code High viability
SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials Build Now
A benchmark and fine-tuned LLM for automatically evaluating K-12 science instructional materials.
GitHub stars n/a Velocity flat History pending AI for Education Apr 28 Code High viability
GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment Build Now
A dataset of 10,217 confirmed GPT-image-2 generated images from Twitter, including analysis of content and a finding that C2PA credentials are stripped.
GitHub stars n/a Velocity flat History pending Generative Image Apr 28 Code High viability
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation Build Now
An LLM agent framework that automates end-to-end 3D cutscene generation by integrating with game engines and orchestrating specialist sub-agents.
GitHub stars n/a Velocity flat History pending Agents Apr 28 Code High viability
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents Build Now
SnapGuard: A lightweight and fast method for detecting prompt injection attacks in screenshot-based web agents by analyzing visual and textual signals.
GitHub stars n/a Velocity flat History pending Web Agents Apr 28 Code High viability
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR Build Now
A label-free reinforcement learning framework for LLMs that decouples answer proposal from reward verification, improving mathematical reasoning and code generation.
GitHub stars n/a Velocity flat History pending LLM Reasoning Apr 28 Code High viability
Toward Scalable Terminal Task Synthesis via Skill Graphs Build Now
A framework for synthesizing diverse terminal tasks using skill graphs to improve agent training for command-line execution.
GitHub stars n/a Velocity flat History pending Agents Apr 28 Code High viability
DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing Build Now
A decoupled reinforcement learning framework for reasoning-driven image editing that optimizes planning independently from the generative model.
GitHub stars n/a Velocity flat History pending AI for Image Editing Apr 28 Code High viability
GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning Build Now
GraphPL, a graph neural network approach for robust modality imputation in patchwork learning, enabling downstream tasks like disease prediction.
GitHub stars n/a Velocity flat History pending Multi-modal Learning Apr 28 Code High viability
Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents Build Now
A neurocognitive governance framework embeds self-governance into autonomous AI agents, achieving 95% compliance accuracy in a retail supply chain workflow.
GitHub stars n/a Velocity flat History pending AI Agents Apr 28 Code High viability
Cross-Lingual Jailbreak Detection via Semantic Codebooks Build Now
A language-agnostic guardrail for LLMs that detects cross-lingual jailbreaks using semantic similarity to an English prompt codebook, without retraining.
GitHub stars n/a Velocity flat History pending LLM Safety Apr 28 Code High viability
R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL Build Now
R3-SQL is a Text-to-SQL framework that improves query ranking consistency and recall through unified reward and agentic resampling, achieving state-of-the-art results.
GitHub stars n/a Velocity flat History pending Text-to-SQL Apr 28 Code High viability
Towards Agentic Investigation of Security Alerts Build Now
An agentic workflow using LLMs and structured queries to automate the initial stages of security alert investigation, improving accuracy and reducing manual workload.
GitHub stars n/a Velocity flat History pending Security AI Apr 28 Code High viability
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts Build Now
An ultra-low-power FPGA-based CNN for on-device cardiac feature extraction, enabling autonomous health monitoring for astronauts.
GitHub stars n/a Velocity flat History pending Edge AI / Health Tech Apr 28 Code High viability
CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation Build Now
A framework for continual brain lesion segmentation that integrates visual features with structured concepts to simulate clinical reasoning and guide model growth.
GitHub stars n/a Velocity flat History pending Medical AI Apr 28 Code High viability
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling Build Now
This research demonstrates that unstructured pruning can augment test-time reasoning performance in LLMs, challenging the notion that pruning always degrades capabilities.
GitHub stars n/a Velocity flat History pending LLM Optimization Apr 28 Code High viability
RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion Build Now
A framework that decouples retrieval and reranking for more accurate multi-modal knowledge graph completion.
GitHub stars n/a Velocity flat History pending Knowledge Graph Completion Apr 28 Code High viability
ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations Build Now
A speaker-adaptive network for emotion recognition in conversations that calibrates features, gates modalities, and regularizes speaker identity for improved accuracy.
GitHub stars n/a Velocity flat History pending Emotion Recognition Apr 28 Code High viability
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models Build Now
An empirical study of uncertainty estimation for audio-aware LLMs, revealing that semantic-level methods are better for reasoning but model-dependent for trustworthiness.
GitHub stars n/a Velocity flat History pending Audio LLMs Apr 28 Code High viability
Sample-efficient Neuro-symbolic Proximal Policy Optimization Build Now
Sample-efficient neuro-symbolic PPO that uses logical policy specifications to accelerate learning in sparse-reward environments.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 28 Code High viability
CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG Build Now
CORAL is an adaptive retrieval methodology for multilingual RAG that iteratively refines retrieval space and query for culturally-aligned answers, improving accuracy by up to 3.58%p.
GitHub stars n/a Velocity flat History pending Multilingual RAG Apr 28 Code High viability
The Forensic Cost of Watermark Removal Build Now
A new evaluation metric for watermark removal that focuses on forensic detectability, revealing artifacts missed by current methods.
GitHub stars n/a Velocity flat History pending Digital Watermarking Apr 28 Code High viability
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring Build Now
SIEVES improves multimodal LLM coverage on out-of-distribution tasks by learning to score the quality of visual evidence for selective prediction.
GitHub stars n/a Velocity flat History pending Multimodal AI Apr 28 Code High viability
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control Build Now
A real-time adaptive traffic signal system that uses computer vision to extend crossing times for vulnerable road users, improving safety and reducing delays.
GitHub stars n/a Velocity flat History pending Computer Vision Apr 28 Code High viability
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices Watch
Optimized communication-efficient federated learning solution to fine-tune language models on edge devices.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI Optimization Apr 28 Code
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models Build Now
A benchmark showing that providing explicit business semantics to LLMs significantly improves accuracy and reduces hallucinations in data analytics queries.
GitHub stars n/a Velocity flat History pending LLM Data Analytics Apr 28 Code High viability
Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study Build Now
A production-ready inference architecture for scalable, cost-effective deployment of complex, multi-model AI systems.
GitHub stars n/a Velocity flat History 1 snapshot AI Infrastructure Apr 28 Code High viability
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents Build Now
An architecture for long-horizon knowledge synthesis that orchestrates LLM agents with explicit state tracking and robust error handling for complex tasks.
GitHub stars n/a Velocity flat History pending Agents Apr 28 Code High viability
Generative UI as an Accessibility Bridge: Lessons from C2C E-Commerce Build Now
Generative UI can create adaptive interfaces for user-generated content platforms, bridging accessibility gaps for visually impaired and older users.
GitHub stars n/a Velocity flat History pending Generative UI Apr 28 Code High viability
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling Ignore
Develop a platform for optimizing complex models using memory-enhanced decentralized debate among AI agents.
GitHub stars n/a Velocity flat History 1 snapshot AI Optimization Tools Apr 28 Pending
One-shot emergency psychiatric triage across 15 frontier AI chatbots Watch
Evaluating 15 frontier AI chatbots on psychiatric triage, this study reveals high accuracy for emergencies but significant over-triage for lower-risk presentations.
GitHub stars n/a Velocity flat History pending Medical AI Apr 28 Code
How Can Reinforcement Learning Achieve Expert-level Placement? Watch
A reinforcement learning framework that learns expert-level chip placement by inferring implicit rewards from expert layouts, improving wirelength optimization.
GitHub stars n/a Velocity flat History pending Chip Design AI Apr 28 Code
Towards Unified Multi-task EEG Analysis with Low-Rank Adaptation Watch
A multi-task EEG analysis framework using LoRA to enable simultaneous adaptation of pre-trained models to multiple downstream tasks, improving efficiency and performance.
GitHub stars n/a Velocity flat History pending EEG Analysis Apr 28 Code
Training Transformers as a Universal Computer Watch
A small transformer model trained to execute programs in MicroPy, demonstrating the potential for transformers to act as universal computers.
GitHub stars n/a Velocity flat History pending AI as Universal Computer Apr 28 Code
TrialCalibre: A Fully Automated Causal Engine for RCT Benchmarking and Observational Trial Calibration Watch
TrialCalibre is a multi-agent system designed to automate and scale the BenchExCal framework for causal effect estimation in real-world evidence studies.
GitHub stars n/a Velocity flat History pending Causal Inference Apr 28 Code
Can Code Evaluation Metrics Detect Code Plagiarism? Watch
This research evaluates existing code evaluation metrics for their effectiveness in detecting source code plagiarism, suggesting that some metrics can be competitive with dedicated tools.
GitHub stars n/a Velocity flat History pending Code Plagiarism Detection Apr 28 Code
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient Ignore
This paper categorizes imperfect rewards in reinforcement learning, showing that some errors can be beneficial for policy gradient optimization.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 28 Pending
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings Watch
A benchmark for evaluating bandgap prediction models in semiconductors under experimental conditions, highlighting generalization limitations.
GitHub stars n/a Velocity flat History pending Materials Science AI Apr 28 Code
VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification Watch
VAE-Inf is a novel framework that combines variational autoencoders with hypothesis testing for statistically interpretable imbalanced classification, offering robust error control.
GitHub stars n/a Velocity flat History pending Imbalanced Classification Apr 28 Code
Dynamic UGV-UAV Cooperative Path Planning in Uncertain Environments Watch
A cooperative path planning system for ground and aerial vehicles to navigate uncertain road networks, demonstrated on urban environments.
GitHub stars n/a Velocity flat History pending Robotics Apr 28 Code
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile Watch
A novel metric and toolset to identify and prioritize data errors that most impact machine learning model performance, enabling targeted data cleaning.
GitHub stars n/a Velocity flat History pending ML Model Debugging Apr 28 Code
Language corpora for the Dutch medical domain Watch
The first large-scale Dutch medical language corpus, comprising 35 billion tokens, is now available on Hugging Face for NLP development.
GitHub stars n/a Velocity flat History pending LLM Training Apr 28 Code
Assistants, Not Architects: The Role of LLMs in Networked Systems Design Ignore
A framework that uses structured specifications and optimization to design networked systems, outperforming LLMs in constraint satisfaction and explainability.
GitHub stars n/a Velocity flat History pending AI for Systems Design Apr 28 Code
HotComment: A Benchmark for Evaluating Popularity of Online Comments Ignore
Introduces HotComment, a multimodal benchmark and StyleCmt model for evaluating online comment popularity by considering content quality, trends, and user behavior.
GitHub stars n/a Velocity flat History pending Comment Popularity Benchmark Apr 28 Code
An Investigation of Linguistic Biases in LLM-Based Recommendations Ignore
Investigating linguistic biases in LLM recommendations across different English dialects and Hindi-English code-switching.
GitHub stars n/a Velocity flat History pending LLM Bias Analysis Apr 28 Code
Emotive Architectures: The Role of LLMs in Adjusting Work Environments Ignore
LLMs can dynamically adjust work environments to enhance focus, well-being, and engagement in hybrid settings, while raising ethical considerations.
GitHub stars n/a Velocity flat History pending LLM Applications Apr 28 Code
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents Ignore
A recursive, sparse mixture-of-experts framework integrated into diffusion models to enhance structured reasoning and text following in image generation.
GitHub stars n/a Velocity flat History pending Multimodal Diffusion Apr 28 Code
Multi-action Tangled Program Graphs for Multi-task Reinforcement Learning with Continuous Control Ignore
A genetic programming algorithm for multi-task reinforcement learning in continuous control environments with interpretable decision flows.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 28 Code
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills Watch
SkillGuard-Robust is a system for auditing untrusted agent skills, improving security review by combining evidence extraction, semantic verification, and consistency adjudication.
AI Security Apr 28
G-Loss: Graph-Guided Fine-Tuning of Language Models Ignore
A graph-guided loss function that improves fine-tuning of language models by incorporating global semantic structure for more discriminative embeddings.
GitHub stars n/a Velocity flat History pending LLM Fine-Tuning Apr 28 Code
Improving Zero-Shot Offline RL via Behavioral Task Sampling Ignore
Improves zero-shot offline reinforcement learning by extracting task vectors directly from the offline dataset for more principled task sampling.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 28 Code
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives Ignore
Evaluating LLMs' understanding of embodied cognition and cultural variation using cross-linguistic demonstratives, revealing English-centric biases.
GitHub stars n/a Velocity flat History pending LLM Evaluation Apr 28 Code
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation Watch
This paper compares traditional and LLM-based methods for estimating recipe nutrients, highlighting a trade-off between accuracy and computational cost.
Food AI Apr 28
PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification Watch
A hybrid framework combines machine learning with Answer Set Programming for context-aware phishing website classification, allowing for efficient knowledge updates.
Cybersecurity AI Apr 28
Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective Ignore
Analyzes weak-to-strong AI alignment failures using a bias-variance perspective to identify risks and improve model robustness.
GitHub stars n/a Velocity flat History pending AI Alignment Apr 28 Code
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum Watch
A novel loss function continuum that mitigates cold-start stalling in reasoning models by dynamically adjusting supervision commitment.
LLM Training Apr 28
QAROO: AI-Driven Online Task Offloading for Energy-Efficient and Sustainable MEC Networks Ignore
An AI-driven framework for offloading tasks in mobile edge computing networks to optimize energy and computing resources.
GitHub stars n/a Velocity flat History pending MEC Networks Apr 28 Code
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation Ignore
Canonical knowledge distillation significantly improves semantic segmentation performance, achieving state-of-the-art results with smaller models.
Computer Vision Apr 28
Investigation into In-Context Learning Capabilities of Transformers Ignore
An empirical study investigating the scaling behavior and geometric conditions for in-context learning in Transformers.
GitHub stars n/a Velocity flat History pending LLM Theory Apr 28 Code
Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions Ignore
A robust reinforcement learning approach for city-scale EV ride-hailing that guarantees feasibility of dispatch, repositioning, and charging decisions.
RL for Operations Apr 28
Large language models eroding science understanding: an experimental study Ignore
Demonstrates how large language models can be manipulated to spread misinformation by prioritizing fringe scientific material, posing risks to public understanding.
GitHub stars n/a Velocity flat History pending LLM Misinformation Risk Apr 28 Code
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation Ignore
A dependency-driven prompt pipeline for generating coherent and structurally consistent RPG content using LLMs.
Generative AI for Games Apr 28
Safe-Support Q-Learning: Learning without Unsafe Exploration Ignore
A Q-learning framework for reinforcement learning that prevents unsafe state visitation during training by leveraging a behavior policy supported on a safe set.
GitHub stars n/a Velocity flat History pending Reinforcement Learning Apr 28 Code
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms Ignore
A threat-oriented digital twinning methodology for evaluating the cybersecurity of autonomous platforms, adaptable for UAV and space systems.
Security AI Apr 28
ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable Ignore
ValueAlpha is a stress-testing protocol for LLM-judged investment rationales, ensuring claims are stable and agreed upon before returns are observable.
AI Finance Evaluation Apr 28
A Faceted Proposal for Transparent Attribution of AI-Assisted Text Production Ignore
A faceted model for transparent attribution of AI-assisted text production, detailing the form, generation, and evaluation of AI interventions.
GitHub stars n/a Velocity flat History pending AI Ethics Apr 28 Code
Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems Ignore
A medoid prototype alignment framework for detecting unknown attacks in industrial control systems across different plants.
Industrial Control Systems Security Apr 28
Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences Ignore
An automated adversarial collaboration framework uses LLMs and program synthesis to adjudicate between competing cognitive science theories.
AI for Science Apr 28
The Role of Symmetry in Optimizing Overparameterized Networks Ignore
Explains how overparameterization in neural networks introduces symmetries that improve optimization and make global minima more reachable.
LLM Training Apr 28
Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents Ignore
A new method for attributing failures in vision-language navigation agents to specific capabilities, improving debugging and agent development.
Agents Apr 28
Co-Writing with AI: An Empirical Study of Diverse Academic Writing Workflows Ignore
An empirical study of how university students integrate AI into diverse academic writing workflows, identifying three distinct usage configurations.
Academic Writing Tools Apr 28
A theoretical framework for bandit problems with smooth payoffs on graphs, applicable to content-based recommendation.
Online Learning Apr 28
DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding Ignore
A framework for evaluating factual correctness in procedural video captioning, separating conceptual and contextual facts.
Multimodal AI Apr 28
UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval Ignore
A novel method for unsupervised domain adaptation in information retrieval that improves document sampling by considering model uncertainty.
Information Retrieval Apr 28
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games Ignore
A transformer-based meta-agent that learns to model and exploit opponents in imperfect-information games.
Game AI / Agents Apr 28
Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task Ignore
GPT-based tools show promise for assisting in spreadsheet model building but remain unreliable for professional use due to inconsistencies and workflow challenges.
Spreadsheet AI Apr 28
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows Ignore
This paper evaluates agentic AI failures in astrophysical workflows, highlighting silent incorrect computations as a critical risk and releasing an evaluation framework.
Agents Apr 28
Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance Ignore
This research investigates positional bias in small LLMs as a failure mode for detecting deliberate underperformance, suggesting positional distribution shifts as a more effective signature than below-chance accuracy.
LLM Behavior Analysis Apr 28
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories Ignore
Reveals a hidden coupling between gradient directions and linear centroids in neural networks, which is obscured by standard optimizers like AdamW.
LLM Training Apr 28
Toward a Functional Geometric Algebra for Natural Language Semantics Ignore
A functional geometric algebra framework that offers a mathematically superior foundation for natural language semantics, enhancing compositionality and interpretability.
NLP Semantics Apr 28
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model Ignore
Propose a TEE-based architecture for auditable AI-assisted grant evaluation without exposing the model.
Auditable AI Apr 28
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers Ignore
This research identifies conditional misalignment in language models, where common interventions can hide emergent misbehavior that reappears in specific contexts.
LLM Safety Apr 28
Knowledge Distillation Must Account for What It Loses Ignore
This paper proposes a framework for accountable knowledge distillation, focusing on preserving teacher model capabilities beyond simple task scores to ensure reliability in deployed systems.
LLM Training Apr 28
Three Models of RLHF Annotation: Extension, Evidence, and Authority Ignore
This paper theoretically distinguishes three models of human feedback for large language models to improve annotation strategies.
LLM Alignment Apr 28
Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment Ignore
A framework for learning language model policies that manage risk through explicit intervention actions, focusing on epistemic conduct.
LLM Alignment Apr 28
Optimally Auditing Adversarial Agents Ignore
Develops algorithms for optimal audit policies in principal-agent games to mitigate fraud in resource allocation.
Game Theory Apr 28
Kohn-Sham Hamiltonian from Effective Field Theory: Quasiparticle Band Narrowing from Frozen Core Dynamics Ignore
An effective field theory approach to derive quasiparticle band narrowing from frozen core dynamics, resolving discrepancies in electronic band structure calculations.
Materials Science AI Apr 28
The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues Ignore
A comprehensive framework for inferring learner cognitive and affective states from observable nonverbal cues, based on a systematic review of existing research.
Learner State Inference Apr 28
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment Ignore
Investigates the persistence of gradient alignment in multi-step settings for subliminal learning, with implications for mitigation methods.
AI Research / Learning Theory Apr 28
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context Ignore
This paper explores value-sensitive AI system designs for prayer, emphasizing the preservation of user agency and authenticity in spiritual contexts.
AI Ethics & Spirituality Apr 28
On Halting vs Converging in Recurrent Graph Neural Networks Ignore
Theoretical analysis of convergence and halting conditions in Recurrent Graph Neural Networks.
Graph Neural Networks Apr 28
Verification of Neural Networks (Lecture Notes) Ignore
Theoretical introduction to the verification of neural networks, covering various architectures and verification techniques.
AI Theory Apr 28
AI as Consumer and Participant: A Co-Design Agenda for MBSE Substrates and Methodology Ignore
This paper argues for a co-design agenda between AI tools and MBSE models to enable AI participation beyond simple prompt-based interaction.
AI for Engineering Apr 28