AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Build Now
AutoResearchBench evaluates AI agents on their ability to discover scientific literature, providing a benchmark for autonomous research capabilities.
GitHub 28 stars Velocity flat History 1 snapshot AI and Data Tools Apr 28 Pending High viability
Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver Build Now
A benchmark for frontier coding agents to autonomously implement ML pipelines, demonstrating significant progress in AI self-improvement capabilities.
GitHub 0 stars Velocity flat History 1 snapshot AI Research Tools Apr 27 Pending High viability
BifDet: A 3D Bifurcation Detection Dataset for Airway-Tree Modeling Build Now
A new dataset and fine-tuned models for 3D airway bifurcation detection in CT scans, addressing a critical gap for respiratory disease analysis.
GitHub 15 stars Velocity flat History 1 snapshot Medical AI Apr 27 Pending High viability
OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction Build Now
OxyGent is an open-source framework for building modular, observable, and evolvable multi-agent systems with a Lego-like assembly paradigm.
GitHub 1862 stars Velocity flat History 1 snapshot Agents Apr 28 Pending High viability
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models Build Now
A new dataset and pipeline for training Retrieval-Augmented Generation models to prioritize retrieved context over internal knowledge.
GitHub 2 stars Velocity flat History 1 snapshot RAG Apr 28 Pending High viability
Large Language Models Explore by Latent Distilling Build Now
Exploratory Sampling (ESamp) is a decoding approach for LLMs that boosts semantic diversity and reasoning performance by using a lightweight distiller to predict novelty signals, with less than 5% overhead.
GitHub 11 stars Velocity flat History 1 snapshot LLM Exploration Apr 27 Pending High viability
DATAREEL: Automated Data-Driven Video Story Generation with Animations Build Now
Automated platform for generating animated video stories from data.
GitHub 0 stars Velocity flat History 1 snapshot Generative Video Apr 28 Pending High viability
spectroxide: A code package for computing cosmic microwave background spectral distortions Build Now
An AI-assisted scientific computing package for calculating cosmic microwave background spectral distortions, offering a novel approach to scientific software development.
GitHub 5 stars Velocity flat History 1 snapshot Scientific Computing Apr 27 Pending High viability
Learning Illumination Control in Diffusion Models Build Now
An open-source pipeline for learning illumination control in diffusion models, enabling precise image adjustments via natural language instructions.
GitHub 1870 stars Velocity flat History 1 snapshot Generative Image Editing Apr 27 Pending High viability
S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models Build Now
A framework for distilling large audio foundation models into smaller, deployable versions using only output embeddings, reducing size by up to 61x while retaining 96% performance.
GitHub 3 stars Velocity flat History 1 snapshot Audio Foundation Models Apr 27 Pending High viability
LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model Build Now
A specialized Korean legal LLM built with domain experts and use-case-driven data curation for practical legal tasks.
GitHub 1870 stars Velocity flat History 1 snapshot LLM Specialization Apr 28 Pending High viability
DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale Build Now
A topology-faithful dimensionality reduction method that preserves global structure better than UMAP at scale, with a new benchmark for evaluation.
GitHub 9 stars Velocity flat History 1 snapshot Dimensionality Reduction Apr 28 Pending High viability
GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation Build Now
A refined workflow and dataset (GAIA-v2-LILT) for creating valid multilingual agent benchmarks that align functionally and culturally, improving agent performance by up to 32.7% over machine-translated versions.
GitHub 0 stars Velocity flat History 1 snapshot Multilingual Agent Benchmarks Apr 27 Pending High viability
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models Build Now
A plug-and-play method to reduce hallucinations in vision-language models by intervening during the prefill stage, improving initial representations.
GitHub 3 stars Velocity flat History 1 snapshot LLM Hallucination Mitigation Apr 28 Pending High viability
Health System Scale Semantic Search Across Unstructured Clinical Notes Build Now
A deployed semantic search system for health systems that retrieves clinical information from millions of notes with sub-second latency and significant cost savings.
GitHub 1 stars Velocity flat History 1 snapshot Medical AI Apr 28 Pending High viability
TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning Build Now
TSN-Affinity is a continual offline reinforcement learning method that uses similarity-driven parameter reuse to prevent catastrophic forgetting and improve multi-task performance.
GitHub 0 stars Velocity flat History 1 snapshot Continual RL Apr 28 Pending High viability
From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems Build Now
A zero-trust semantic gateway for secure enterprise AI-native systems that validates autonomous agents as stochastic state-transition systems.
GitHub 0 stars Velocity flat History 1 snapshot Agents Apr 28 Pending High viability
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate Build Now
Distill complex multi-agent debate into a single LLM for improved reasoning and controllable behavior, reducing token usage by up to 93%.
GitHub 1 stars Velocity flat History 1 snapshot LLM Reasoning Apr 27 Pending High viability
QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention Build Now
QFlash enables end-to-end integer-only FlashAttention for Vision Transformers, significantly speeding up computation and reducing energy consumption.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 28 Pending High viability
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices Watch
AHASD is a novel asynchronous heterogeneous architecture for mobile LLM speculative decoding, significantly improving throughput and energy efficiency.
GitHub 4 stars Velocity flat History 1 snapshot LLM Inference Optimization Apr 28 Pending
Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest Build Now
Cooperate to Compete (C2C) is a novel multi-agent environment and dataset for studying strategic negotiation and coordination in LM-based agents, with implications for real-world deployments.
GitHub 0 stars Velocity flat History 1 snapshot Multi-Agent Systems Apr 28 Pending High viability
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization Build Now
Agentic Architect optimizes microarchitecture design using AI-driven evolutionary models.
GitHub stars n/a Velocity flat History 1 snapshot AI for Architecture Design Apr 28 Code High viability
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering Watch
A challenging benchmark for multimodal, multi-entity, multi-hop visual question answering to evaluate large language models' reasoning capabilities.
GitHub 0 stars Velocity flat History 1 snapshot Multimodal VQA Apr 28 Pending
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices Build Now
Enhancing federated learning of LLMs on edge devices with Fisher-Guided Token Quantization for significant communication efficiency and speedup.
GitHub stars n/a Velocity flat History 1 snapshot Federated LLM Fine-Tuning Apr 28 Code High viability
Recursive Multi-Agent Systems Build Now
A recursive multi-agent framework that enhances reasoning and efficiency in complex tasks through iterative collaboration and latent state transfer.
GitHub stars n/a Velocity flat History 1 snapshot Multi-Agent Systems Apr 28 Code High viability
Action-Aware Generative Sequence Modeling for Short Video Recommendation Build Now
A tailored recommendation engine using action-aware generative modeling for short video platforms to enhance user retention.
GitHub stars n/a Velocity flat History 1 snapshot Video Recommendation Apr 28 Code High viability
The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation Watch
Canonical knowledge distillation significantly improves semantic segmentation performance, achieving state-of-the-art results with smaller models by matching compute budgets.
GitHub 713 stars Velocity flat History 1 snapshot Computer Vision Apr 28 Pending
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation Build Now
LLM-ReSum is a self-reflective summarization framework that integrates LLM-based evaluation and generation in a feedback loop, improving factual accuracy by up to 33% and coverage by 39%.
GitHub stars n/a Velocity flat History 1 snapshot LLM Summarization Apr 28 Code High viability
BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks Build Now
BenchGuard is an automated auditing framework that uses LLMs to find critical flaws in agent benchmarks, making AI development more reliable and cost-effective.
GitHub stars n/a Velocity flat History 1 snapshot AI Benchmarking Apr 27 Code High viability
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling Build Now
Open-source, highly efficient multilingual Mixture-of-Experts language models with a strong performance-to-compute ratio.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Code High viability
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing? Build Now
A multi-agent framework that decomposes instructed code editing into specialized roles to improve reliability and reduce unintended changes.
GitHub stars n/a Velocity flat History 1 snapshot Code Editing Agents Apr 28 Code High viability
Co-Director: Agentic Generative Video Storytelling Build Now
A hierarchical multi-agent framework that optimizes video storytelling for coherent narratives and personalized advertising.
GitHub stars n/a Velocity flat History 1 snapshot Generative Video Agents Apr 27 Code High viability
RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements Build Now
A benchmark and metric for evaluating LLM-generated REST API tests based on natural language requirements, improving functional validation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Testing Apr 28 Code High viability
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling Build Now
Luminol-AIDetect efficiently identifies AI-generated text by leveraging unique structural signals.
GitHub stars n/a Velocity flat History 1 snapshot AI Content Detection Apr 28 Code High viability
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Build Now
Improves generative model alignment with complex human preferences by using a semi-supervised approach to handle noisy preference data.
GitHub stars n/a Velocity flat History 1 snapshot Generative Image/Video Apr 27 Pending High viability
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models Build Now
A multi-source benchmark and evaluation pipeline for assessing the structured output quality of large language models across text, image, and audio inputs.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 28 Code High viability
ViPO: Visual Preference Optimization at Scale Build Now
Scales visual preference optimization for generative models by introducing a novel adaptive algorithm and a massive, high-quality preference dataset.
GitHub 0 stars Velocity flat History 1 snapshot Generative Image/Video Apr 27 Pending High viability
MAIC-UI: Making Interactive Courseware with Generative UI Ignore
Develop a tool for creating interactive courseware with a generative user interface.
GitHub 32 stars Velocity flat History 1 snapshot Education Technology Apr 28 Pending
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate Build Now
Generate synthetic training data for custom LLM guardrails using a debate framework, eliminating the need for extensive human annotation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Guardrails Apr 28 Code High viability
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models Build Now
A framework that uses interpretability insights to select data that maximally activates internal task features, leading to highly efficient LLM fine-tuning.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 28 Code High viability
SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton Build Now
A 3D hierarchical music generation model that creates controllable orchestral pieces with advanced harmony and reduced dissonance.
GitHub stars n/a Velocity flat History 1 snapshot Generative Music Apr 28 Code High viability
PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices Build Now
Physics-informed test-time adaptation for robust human activity recognition on mobile devices, improving accuracy and stability.
GitHub stars n/a Velocity flat History 1 snapshot Human Activity Recognition Apr 28 Code High viability
SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials Build Now
A benchmark and fine-tuned LLM for automatically evaluating K-12 science instructional materials.
GitHub stars n/a Velocity flat History 1 snapshot AI for Education Apr 28 Code High viability
Sparse Personalized Text Generation with Multi-Trajectory Reasoning Build Now
A reinforcement learning framework for cold-start LLM personalization that jointly reasons over user writing styles and topic preferences.
GitHub stars n/a Velocity flat History 1 snapshot LLM Personalization Apr 27 Code High viability
Adaptive Prompt Embedding Optimization for LLM Jailbreaking Build Now
This research presents a novel white-box jailbreaking technique for LLMs that directly optimizes prompt embeddings, outperforming existing methods by preserving prompt semantics while achieving higher success rates.
GitHub stars n/a Velocity flat History 1 snapshot LLM Security Apr 27 Code High viability
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents Build Now
Lightweight and fast prompt injection detection for screenshot-based web agents, outperforming larger models.
GitHub stars n/a Velocity flat History 1 snapshot Web Agents Apr 28 Code High viability
Toward Scalable Terminal Task Synthesis via Skill Graphs Build Now
An automated framework for synthesizing diverse terminal tasks using skill graphs to train more capable command-line agents.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Code High viability
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings Build Now
A benchmark for reliable semiconductor bandgap prediction, bridging the gap between computational models and experimental measurements.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 28 Code High viability
Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective Build Now
Analyzes weak-to-strong AI alignment failures using a bias-variance perspective, identifying strong-model variance as a key predictor of deception.
GitHub stars n/a Velocity flat History 1 snapshot AI Alignment Apr 28 Code High viability
EVT-Based Generative AI for Tail-Aware Channel Estimation Build Now
Generative AI integrated with extreme value theory for tail-aware channel estimation in URLLC networks.
GitHub stars n/a Velocity flat History 1 snapshot Wireless AI Apr 27 Code High viability
asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics Build Now
A friction-aware reinforcement learning framework for zero-shot sim-to-real transfer in humanoid ballbots, enabling expressive maneuvers via a generalized iOS ecosystem.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 27 Code High viability
Can Code Evaluation Metrics Detect Code Plagiarism? Build Now
A tool to evaluate code plagiarism detection metrics, offering a path to more reliable academic integrity checks in software engineering.
GitHub stars n/a Velocity flat History 1 snapshot Code Analysis Apr 28 Code High viability
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation Build Now
An LLM agent framework that automates end-to-end 3D cutscene generation by integrating with game engines and orchestrating specialist sub-agents.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Code High viability
Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents Build Now
A neurocognitive governance framework for autonomous AI agents that embeds self-governance principles into their reasoning, achieving 95% compliance accuracy in a retail supply chain workflow.
GitHub stars n/a Velocity flat History 1 snapshot AI Agents Apr 28 Code High viability
GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment Build Now
The first dataset of GPT-Image-2 generated images from Twitter, analyzing their characteristics and the challenges of content provenance.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI Apr 28 Code High viability
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts Build Now
An ultra-low-power FPGA-based CNN for on-device cardiac feature extraction in smart health sensors, enabling autonomous health monitoring for astronauts.
GitHub stars n/a Velocity flat History 1 snapshot Edge AI / Health Tech Apr 28 Code High viability
Analyzing LLM Reasoning to Uncover Mental Health Stigma Build Now
Analyzing LLM reasoning steps to uncover and categorize hidden mental health stigma, offering a more nuanced evaluation than traditional methods.
GitHub stars n/a Velocity flat History 1 snapshot LLM Bias Detection Apr 27 Code High viability
Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System Build Now
Libra-VLA is a dual-system VLA architecture for robotics that decouples coarse-to-fine planning and refinement for scalable, robust, and responsive open-world manipulation.
GitHub stars n/a Velocity flat History 1 snapshot Robotic Manipulation Apr 27 Code High viability
R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL Build Now
R3-SQL is a Text-to-SQL framework that improves query ranking consistency and recall through unified reward and agentic resampling, achieving state-of-the-art performance.
GitHub stars n/a Velocity flat History 1 snapshot Text-to-SQL Apr 28 Code High viability
PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators Build Now
PSI-Bench provides clinically grounded and interpretable evaluation for depression patient simulators, identifying key limitations and guiding future development.
GitHub stars n/a Velocity flat History 1 snapshot Mental Health AI Apr 28 Code High viability
Towards Agentic Investigation of Security Alerts Build Now
An agentic workflow using LLMs and structured queries to automate the initial investigation of security alerts, improving accuracy and reducing analyst workload.
GitHub stars n/a Velocity flat History 1 snapshot Security AI Apr 28 Code High viability
ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations Build Now
A speaker-adaptive network for emotion recognition in conversations that calibrates input, gates modality trust, and regularizes speaker features for improved accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Emotion Recognition Apr 28 Code High viability
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models Build Now
An empirical study of uncertainty estimation for audio-aware large language models to improve reliability and detect hallucinations.
GitHub stars n/a Velocity flat History 1 snapshot Audio LLMs Apr 28 Code High viability
Learning Generalizable Multimodal Representations for Software Vulnerability Detection Build Now
A multimodal framework that uses code and comments to significantly improve software vulnerability detection accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Code Intelligence Apr 28 Code High viability
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams Build Now
DRAGON is a new benchmark and dataset for evaluating evidence-grounded visual reasoning in diagrams, addressing limitations of current vision-language models.
GitHub stars n/a Velocity flat History 1 snapshot Visual Reasoning Benchmark Apr 28 Code High viability
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR Build Now
A label-free reinforcement learning framework for LLMs that decouples answer proposal from reward verification, improving mathematical reasoning and code generation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Reasoning Apr 28 Code High viability
DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing Build Now
A decoupled reinforcement learning framework for reasoning-driven image editing that optimizes planning independently from generation.
GitHub stars n/a Velocity flat History 1 snapshot AI for Image Editing Apr 28 Code High viability
Sample-efficient Neuro-symbolic Proximal Policy Optimization Build Now
A sample-efficient neuro-symbolic PPO that uses logical policy specifications to guide learning in sparse-reward reinforcement learning domains.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code High viability
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring Build Now
SIEVES improves multimodal LLM coverage on out-of-distribution tasks by scoring the quality of visual evidence used in predictions.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 28 Code High viability
Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study Build Now
A production-ready inference architecture for scalable and cost-effective deployment of complex, multi-model AI systems.
GitHub stars n/a Velocity flat History 1 snapshot AI Infrastructure Apr 28 Code High viability
CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation Build Now
A continual learning framework for brain lesion segmentation that integrates visual features with structured concepts for improved adaptation and interpretability.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 28 Code High viability
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models Build Now
Enables parameter-efficient multi-anchor word representations in large language models through novel projection and encoding techniques.
GitHub stars n/a Velocity flat History 1 snapshot LLM Embeddings Apr 27 Code High viability
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control Build Now
A real-time adaptive traffic signal system that uses YOLOv12 and ByteTrack to extend crossing times for vulnerable road users, improving safety by 71.4%.
GitHub stars n/a Velocity flat History 1 snapshot Computer Vision Apr 28 Code High viability
Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile Build Now
A tool to measure model sensitivity to data errors, enabling prioritized data cleaning for improved machine learning performance.
GitHub stars n/a Velocity flat History 1 snapshot MLOps Apr 28 Code High viability
CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG Build Now
CORAL is an adaptive retrieval methodology for multilingual RAG that iteratively refines retrieval space and query for culturally-aligned answers, improving accuracy by up to 3.58%p on low-resource languages.
GitHub stars n/a Velocity flat History 1 snapshot Multilingual RAG Apr 28 Code High viability
The Forensic Cost of Watermark Removal Build Now
A watermark removal detection method that identifies hidden statistical artifacts, making current removal techniques forensically detectable.
GitHub stars n/a Velocity flat History 1 snapshot Digital Forensics Apr 28 Code High viability
Language corpora for the Dutch medical domain Build Now
A large-scale Dutch medical language corpus released on Hugging Face to enable NLP development in the Dutch medical domain.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28 Code High viability
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models Build Now
A benchmark demonstrating that providing explicit semantic context significantly improves LLM accuracy for data analytics, reducing hallucinations.
GitHub stars n/a Velocity flat History 1 snapshot LLM Data Analytics Apr 28 Code High viability
How Can Reinforcement Learning Achieve Expert-level Placement? Watch
A reinforcement learning framework that learns expert rewards directly from expert chip layouts to achieve expert-level placement.
GitHub stars n/a Velocity flat History 1 snapshot Chip Design AI Apr 28 Code
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents Build Now
An orchestration architecture for LLM agents that improves long-horizon knowledge synthesis by managing knowledge states and evidence chains.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28 Code High viability
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Watch
Nemotron 3 Nano Omni is an open-source multimodal AI model with native audio support, offering improved accuracy and efficiency for real-world applications like document understanding and agentic computer use.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 27 High viability
One-shot emergency psychiatric triage across 15 frontier AI chatbots Watch
Evaluating 15 frontier AI chatbots on psychiatric triage, this study reveals high accuracy for emergencies but significant over-triage for lower-risk presentations.
GitHub stars n/a Velocity flat History 1 snapshot Medical AI Apr 28 Code
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents Watch
A recursive sparse mixture-of-experts framework for multimodal diffusion models that enhances structured reasoning in text-to-image generation.
GitHub stars n/a Velocity flat History 1 snapshot Generative Vision Apr 28 Code
RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion Watch
A novel framework that decouples retrieval and reranking for multi-modal knowledge graph completion, improving accuracy.
GitHub stars n/a Velocity flat History 1 snapshot Knowledge Graph Completion Apr 28 Code
Assessing Y-Axis Influence: Bias in Multimodal Language Models on Chart-to-Table Translation Watch
A framework to analyze and mitigate y-axis bias in multimodal language models for chart-to-table translation, improving performance and fairness.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 27 Code
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling Ignore
Optimize operations research problems with AI via decentralized debate and memory-enhanced agents.
GitHub 0 stars Velocity flat History 1 snapshot Optimization Modeling Apr 28 Pending
TrialCalibre: A Fully Automated Causal Engine for RCT Benchmarking and Observational Trial Calibration Watch
TrialCalibre is a conceptualized multi-agent system to automate and scale causal inference for RCT benchmarking and observational trial calibration.
GitHub stars n/a Velocity flat History 1 snapshot Causal Inference AI Apr 28 Code
Towards Unified Multi-task EEG Analysis with Low-Rank Adaptation Watch
A multi-task EEG analysis framework using low-rank adaptation to efficiently adapt pre-trained models to multiple downstream tasks.
GitHub stars n/a Velocity flat History 1 snapshot EEG Analysis Apr 28 Code
GraphPL: Leveraging GNN for Efficient and Robust Modalities Imputation in Patchwork Learning Watch
GraphPL, a graph neural network approach for unsupervised modality imputation in patchwork learning settings, demonstrating state-of-the-art performance on benchmark and real-world datasets.
GitHub stars n/a Velocity flat History 1 snapshot Multi-modal Learning Apr 28 Code
Cross-Lingual Jailbreak Detection via Semantic Codebooks Watch
A language-agnostic system to detect cross-lingual LLM jailbreaks using semantic similarity without retraining.
GitHub stars n/a Velocity flat History 1 snapshot LLM Safety Apr 28 Code
Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs Ignore
Improving small language model reasoning under compute and token constraints through stepwise guidance.
GitHub 1 stars Velocity flat History 1 snapshot LLM Reasoning Apr 27 Pending
Dynamic UGV-UAV Cooperative Path Planning in Uncertain Environments Watch
Cooperative path planning for ground and aerial vehicles to navigate uncertain road networks, useful for disaster response.
GitHub stars n/a Velocity flat History 1 snapshot Robotics Apr 28 Code
Training Transformers as a Universal Computer Watch
Demonstrates that a small transformer can be trained to execute programs in a universal programming language, acting as a universal computer.
GitHub stars n/a Velocity flat History 1 snapshot AI as Universal Computer Apr 28 Code
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling Watch
This research demonstrates that unstructured pruning can augment test-time reasoning performance in LLMs, challenging the notion that pruning always degrades capabilities.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 28 Code
VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification Watch
VAE-Inf is a novel framework for imbalanced classification that combines variational autoencoders with hypothesis testing for statistically interpretable and robust predictions.
GitHub stars n/a Velocity flat History 1 snapshot Imbalanced Classification Apr 28 Code
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient Ignore
This paper categorizes imperfect rewards in policy gradient optimization, showing that errors can be benign or beneficial, and proposes new evaluation metrics for reward models.
GitHub 0 stars Velocity flat History 1 snapshot LLM Training Apr 28 Pending
Generative UI as an Accessibility Bridge: Lessons from C2C E-Commerce Watch
Generative UI can create adaptive interfaces for e-commerce platforms, bridging accessibility gaps for users with disabilities.
GitHub stars n/a Velocity flat History 1 snapshot Generative UI Apr 28 Code
Assistants, Not Architects: The Role of LLMs in Networked Systems Design Ignore
A framework that uses structured specifications and optimization to design networked systems, outperforming LLMs in constraint satisfaction.
GitHub stars n/a Velocity flat History 1 snapshot AI for Systems Design Apr 28 Code
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation Watch
An LLM-powered system for accurate recipe nutrient estimation, balancing precision with computational efficiency for dietary monitoring.
GitHub stars n/a Velocity flat History 1 snapshot Food AI Apr 28
G-Loss: Graph-Guided Fine-Tuning of Language Models Ignore
A novel graph-guided loss function that improves fine-tuning of language models by incorporating global semantic structure for better classification accuracy.
GitHub stars n/a Velocity flat History 1 snapshot LLM Fine-Tuning Apr 28 Code
HotComment: A Benchmark for Evaluating Popularity of Online Comments Ignore
Introduces HotComment, a multimodal benchmark and StyleCmt model for evaluating online comment popularity by considering content quality, trends, and user behavior.
GitHub stars n/a Velocity flat History 1 snapshot Comment Popularity Benchmark Apr 28 Code
UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval Ignore
A novel document sampling method for unsupervised domain adaptation in information retrieval that prioritizes model uncertainty to improve generalization.
GitHub stars n/a Velocity flat History 1 snapshot Information Retrieval Apr 28 Pending
An Investigation of Linguistic Biases in LLM-Based Recommendations Ignore
An investigation into how linguistic biases in different English dialects affect LLM recommendations for restaurants and products.
GitHub stars n/a Velocity flat History 1 snapshot LLM Bias Analysis Apr 28 Code
PHISHREV: A Hybrid Machine Learning and Post-Hoc Non-monotonic Reasoning Framework for Context-Aware Phishing Website Classification Watch
A hybrid framework combines machine learning with non-monotonic reasoning to improve context-aware phishing website classification and allows for efficient knowledge updates.
GitHub stars n/a Velocity flat History 1 snapshot Security AI Apr 28
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives Ignore
Evaluating LLMs' understanding of embodied cognition and cultural variation using cross-linguistic demonstratives.
GitHub stars n/a Velocity flat History 1 snapshot LLM Evaluation Apr 28 Code
Multi-action Tangled Program Graphs for Multi-task Reinforcement Learning with Continuous Control Ignore
A genetic programming algorithm for multi-task reinforcement learning in continuous control environments with interpretable decision flows.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance Ignore
This research investigates positional bias in small LLMs as a failure mode for detecting deliberate underperformance, suggesting positional distribution shifts as a more effective signature.
GitHub 0 stars Velocity flat History 1 snapshot LLM Behavior Analysis Apr 28 Pending
Improving Zero-Shot Offline RL via Behavioral Task Sampling Ignore
Improves zero-shot reinforcement learning by sampling task vectors directly from offline datasets, enhancing generalization.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum Watch
A novel loss function continuum for reasoning models that mitigates cold-start issues in fine-tuning by balancing exploitation and density estimation.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories Ignore
Reveals hidden linear-centroid coupling in neural networks by analyzing gradient directions, impacting feature formation and grokking.
GitHub 0 stars Velocity flat History 1 snapshot LLM Training Apr 28 Pending
Investigation into In-Context Learning Capabilities of Transformers Ignore
A systematic empirical study investigating the scaling behavior and geometric conditions for in-context learning in Transformers.
GitHub 0 stars Velocity flat History 1 snapshot LLM Theory & Scaling Apr 28 Pending
QAROO: AI-Driven Online Task Offloading for Energy-Efficient and Sustainable MEC Networks Ignore
An AI-driven framework for online task offloading in mobile edge computing networks that optimizes computing and energy resources.
GitHub stars n/a Velocity flat History 1 snapshot MEC Networks Apr 28 Code
Scalable Secure Biometric Authentication without Auxiliary Identifiers Ignore
A novel system combining AI and cryptography for scalable, secure biometric authentication without auxiliary identifiers, protecting against data breaches.
GitHub stars n/a Velocity flat History 1 snapshot Secure Biometrics Apr 27
Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions Ignore
A semi-Markov reinforcement learning approach for city-scale EV ride-hailing that guarantees feasibility of actions and optimizes net profit.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Systems Apr 28
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills Ignore
SkillGuard-Robust is a novel system for auditing untrusted agent skills, improving security review consistency and robustness against semantic-preserving rewrites.
GitHub stars n/a Velocity flat History 1 snapshot Agent Security Apr 28
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games Ignore
A transformer-based meta-agent that learns to model and exploit opponents in imperfect-information games by adapting its policy towards best-response exploitation.
GitHub stars n/a Velocity flat History 1 snapshot Game AI Apr 28
Emotive Architectures: The Role of LLMs in Adjusting Work Environments Ignore
Investigating the use of LLMs to create dynamic, emotionally receptive work environments by adjusting physical and virtual settings.
GitHub stars n/a Velocity flat History 1 snapshot Workplace AI Apr 28 Code
Threat-Oriented Digital Twinning for Security Evaluation of Autonomous Platforms Ignore
A methodology for creating digital twins to evaluate the security of autonomous platforms against various threats.
GitHub stars n/a Velocity flat History 1 snapshot Autonomous Systems Security Apr 28
Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences Ignore
An automated adversarial collaboration framework uses LLMs and program synthesis to adjudicate between competing scientific theories.
GitHub stars n/a Velocity flat History 1 snapshot AI for Science Apr 28
Large language models eroding science understanding: an experimental study Ignore
Demonstrates how large language models can be manipulated to spread misinformation by prioritizing fringe scientific material, posing risks to public understanding.
GitHub stars n/a Velocity flat History 1 snapshot LLM Misinformation Risk Apr 28 Code
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation Ignore
A dependency-driven prompt pipeline for generating coherent and scalable RPG content using LLMs.
GitHub stars n/a Velocity flat History 1 snapshot Generative AI for Games Apr 28
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents Ignore
A framework for transforming partially specified human intent into inspectable artifacts for open-world AI agents by formalizing closure gaps and defining delegation envelopes.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 27 Code
Safe-Support Q-Learning: Learning without Unsafe Exploration Ignore
A Q-learning framework for safe reinforcement learning that prevents unsafe state visitation during training.
GitHub stars n/a Velocity flat History 1 snapshot Reinforcement Learning Apr 28 Code
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment Ignore
Investigates the persistence of gradient alignment in multi-step settings for subliminal learning, showing current mitigation methods may not reliably suppress trait acquisition.
GitHub 1870 stars Velocity flat History 1 snapshot AI Theory / ML Research Apr 28 Pending
Faithful Autoformalization via Roundtrip Verification and Repair Ignore
Ensuring LLM formalizations are faithful through roundtrip verification and targeted repair.
GitHub stars n/a Velocity flat History 1 snapshot LLM Formalization Apr 27
A Faceted Proposal for Transparent Attribution of AI-Assisted Text Production Ignore
A faceted model for transparent attribution of AI-assisted text production, detailing the form, generation, evaluation, intent, control, and traceability of AI interventions.
GitHub stars n/a Velocity flat History 1 snapshot AI Ethics Apr 28 Code
Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems Ignore
A medoid prototype alignment framework for detecting unknown attacks in industrial control systems across different plants.
GitHub stars n/a Velocity flat History 1 snapshot Industrial Control Systems Security Apr 28
Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents Ignore
A novel approach to test and attribute failures in embodied agents by focusing on specific capabilities.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28
A theoretical framework for bandit problems with smooth payoffs on graphs, applicable to content-based recommendation.
GitHub stars n/a Velocity flat History 1 snapshot Online Learning Apr 28
SUDP: Secret-Use Delegation Protocol for Agentic Systems Ignore
A protocol for secure delegation of user secrets in agentic systems to enable authorized operations without exposing reusable authority.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 27
Co-Writing with AI: An Empirical Study of Diverse Academic Writing Workflows Ignore
An empirical study exploring how university students integrate AI tools into diverse academic writing workflows, identifying three distinct usage configurations.
GitHub stars n/a Velocity flat History 1 snapshot Academic Writing Tools Apr 28
The Role of Symmetry in Optimizing Overparameterized Networks Ignore
Explains how overparameterization in neural networks improves optimization through weight-space symmetries.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding Ignore
A framework for evaluating factual correctness in procedural video captions, highlighting limitations of current multimodal models.
GitHub stars n/a Velocity flat History 1 snapshot Multimodal AI Apr 28
Learning with Embedded Linear Equality Constraints via Variational Bayesian Inference Ignore
A Bayesian framework to embed linear relationships into machine learning for improved uncertainty estimates and adherence to physical constraints.
GitHub stars n/a Velocity flat History 1 snapshot Scientific ML Apr 27
Spreadsheet Modeling Experiments Using GPTs on Small Problem Statements and the Wall Task Ignore
GPT-based tools show promise for assisting in spreadsheet model building but remain unreliable for professional use due to inconsistencies and workflow challenges.
GitHub stars n/a Velocity flat History 1 snapshot Spreadsheet AI Apr 28
Compute Aligned Training: Optimizing for Test Time Inference Ignore
This paper introduces a new training methodology for LLMs that aligns training objectives with test-time inference strategies to improve performance.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 27
ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable Ignore
ValueAlpha is a stress-testing protocol for LLM-judged investment rationales, ensuring claims are stable and agreed upon before observable returns.
GitHub stars n/a Velocity flat History 1 snapshot AI Finance Evaluation Apr 28
MultiHedge: Adaptive Coordination via Retrieval-Augmented Control Ignore
MultiHedge is a hybrid architecture using LLMs and retrieval to improve robustness and stability in modular decision-making systems.
GitHub stars n/a Velocity flat History 1 snapshot LLM Coordination Apr 27
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows Ignore
This paper analyzes agentic AI failures in astrophysical workflows, highlighting silent incorrect computations as a critical risk and releasing an evaluation framework.
GitHub stars n/a Velocity flat History 1 snapshot Agents Apr 28
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model Ignore
A TEE-based architecture for auditable AI grant evaluation without exposing the model, focusing on remote attestation and prompt injection risks.
GitHub stars n/a Velocity flat History 1 snapshot AI Auditing Apr 28
Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning Ignore
Investigates the functional perspective of layer redundancy in LLMs, showing calibration objectives influence pruning more than search algorithms.
GitHub stars n/a Velocity flat History 1 snapshot LLM Optimization Apr 27
Barriers and Enablers of Online Instruction in Hospitality Education in the Philippines: An Exploratory Study Ignore
An exploratory study on the barriers and enablers of online instruction in hospitality education, highlighting the need for AI training and support.
GitHub stars n/a Velocity flat History 1 snapshot EdTech AI Adoption Apr 27
Kohn-Sham Hamiltonian from Effective Field Theory: Quasiparticle Band Narrowing from Frozen Core Dynamics Ignore
Develops an effective field theory to explain discrepancies in Kohn-Sham eigenvalues and ARPES measurements for metals, offering a computationally inexpensive correction.
GitHub stars n/a Velocity flat History 1 snapshot Materials Science AI Apr 28
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers Ignore
This research identifies conditional misalignment in language models, where common interventions can hide emergent misbehavior that reappears when prompts resemble training data.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
Toward a Functional Geometric Algebra for Natural Language Semantics Ignore
Proposes geometric algebra as a superior mathematical foundation for natural language semantics, offering enhanced compositionality and interpretability over linear algebra.
GitHub stars n/a Velocity flat History 1 snapshot NLP Semantics Apr 28
Knowledge Distillation Must Account for What It Loses Ignore
This paper proposes a framework for accountable knowledge distillation, focusing on preserving critical teacher model capabilities beyond simple task scores to ensure reliable and safe student models.
GitHub stars n/a Velocity flat History 1 snapshot LLM Training Apr 28
Leverage Laws: A Per-Task Framework for Human-Agent Collaboration Ignore
A per-task leverage ratio framework for human-agent collaboration, analyzing information flow and task novelty to optimize efficiency.
GitHub stars n/a Velocity flat History 1 snapshot Human-Agent Collaboration Framework Apr 27
Three Models of RLHF Annotation: Extension, Evidence, and Authority Ignore
This paper theoretically distinguishes three models of human feedback in RLHF to guide annotation strategies and identify failure modes.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
Optimally Auditing Adversarial Agents Ignore
Develops algorithms for designing strategic audits to mitigate fraud in resource allocation domains by modeling audit policy design as a principal-agent game.
GitHub stars n/a Velocity flat History 1 snapshot Game Theory / AI Apr 28
Transformer Approximations from ReLUs Ignore
Develop a systematic recipe for translating ReLU approximation results to softmax attention mechanisms in transformers, providing new analytical tools.
GitHub stars n/a Velocity flat History 1 snapshot Transformer Theory Apr 27
AI as Consumer and Participant: A Co-Design Agenda for MBSE Substrates and Methodology Ignore
This paper argues for a co-design agenda between AI tools and MBSE models to enable AI participation beyond simple prompt-based interaction.
GitHub stars n/a Velocity flat History 1 snapshot AI for Engineering Apr 28
Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions Ignore
Overview of Internet of Everything paradigms, enablers, and future directions in the 6G era.
GitHub stars n/a Velocity flat History 1 snapshot 6G IoE Apr 27
The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues Ignore
A comprehensive framework for inferring learner cognitive and affective states from observable nonverbal cues, based on a systematic review of existing research.
GitHub stars n/a Velocity flat History 1 snapshot Learner State Inference Apr 28
Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment Ignore
A framework for learning language model policies that manage epistemic and normative risk through explicit intervention actions.
GitHub stars n/a Velocity flat History 1 snapshot LLM Alignment Apr 28
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context Ignore
This paper explores value-sensitive AI systems for prayer, focusing on preserving user agency and authenticity in spiritual contexts.
GitHub stars n/a Velocity flat History 1 snapshot Value-Sensitive AI Apr 28
Verification of Neural Networks (Lecture Notes) Ignore
Theoretical introduction to the verification of neural networks, covering feed-forward, recurrent, attention mechanisms, and transformers.
GitHub stars n/a Velocity flat History 1 snapshot AI Theory Apr 28
On Halting vs Converging in Recurrent Graph Neural Networks Ignore
Establishes expressiveness relationships between different types of Recurrent Graph Neural Networks and introduces a coordination protocol for asynchronous halting.
GitHub stars n/a Velocity flat History 1 snapshot Graph Neural Networks Apr 28