Reference Asset

AI Research Dataset

Open artifact of 1,000 production rows with 12 fields, immutable JSON/CSV/schema receipts, and API parity. Freshness window ended 2026-04-23T17:29:57.989Z. CC BY 4.0.

Schema Manifest

StaleUpdated 1mo ago1,000 sourcesMethodologyAPI

Artifact

public-dataset…22T17-29-57-989Zpublic-dataset-2026-04-22T17-29-57-989Z

SHA-256 (JSON)

2e10ffe1…6df4d7e9

Artifacts

JSON · 1.8 MB
CSV · 1.7 MB
Schema · 712 B

FreshnessStalePublic dataset artifact is outside its declared freshness window.

Fresh until2026-04-23T17:29:57.989Z

Last updated2026-04-21T17:59:02.000Z

Rows1,000

View receipt manifest

PREVIEW · 1,000 ROWS

Dataset sample rows
#	arxiv_id	Title	Score	Cluster	Tags
1	2604.19740v1	Generalization at the Edge of Stability This research theoretically explores the 'sharpness dimension' to understand generalization in large learning rate neural network training, offering insights into chaotic optimization regimes.	3	LLM Training	high_potential
2	2604.19734v1	UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning a… UniT is a framework that bridges the gap between human and humanoid robot learning by creating a unified physical language for policy learning and world modeling.	7	Robotics	series_a_plushigh_potential
3	2604.19730v1	FASTER: Value-Guided Sampling for Fast RL Develop a reinforcement learning tool that leverages value-guided sampling for improved efficiency and scalability.	7	Reinforcement Learning	quick_buildhigh_potential
4	2604.19728v1	VLA Foundry: A Unified Framework for Training Vision-Language-Action Models VLA Foundry: A unified framework for training vision-language-action models in robotics.	8	AI Frameworks	quick_buildhigh_potential
5	2604.19724v1	Benign Overfitting in Adversarial Training for Vision Transformers Theoretical analysis of adversarial training for Vision Transformers reveals conditions for benign overfitting, improving robustness.	3	Computer Vision

Showing 5 of 1,000 rows. Full export via API or download.

Column	Type	Example	Description
`arxiv_id`	string	`2604.19740v1`	Canonical arXiv identifier; primary key.
`title`	string	`Generalization at the Edge of Stability`	Paper title as published.
`abstract`	string	`Training modern neural networks…`	Original abstract text.
`published_date`	string (ISO 8601)	`2026-04-21T17:59:02+00:00`	Original publication date on arXiv.
`viability_score`	number \| null	`3`	Composite commercial viability rank.
`cluster_label`	string	`LLM Training`	Research field assigned during clustering.
`has_code`	boolean	`true`	True when an external repository URL is attached.
`repo_url`	string \| null	`https://github.com/owner/repo`	URL of the linked code repository, if any.
`commercial_flags`	string[]	`["has_code","high_potential"]`	Signal flags such as has_code or high_potential.
`one_liner`	string	`This research theoretically explores the 'sharpness dimensio…`	Short, human-readable summary of the paper.
`time_to_mvp`	string	`6+ months`	Coarse estimate of time required to ship an MVP.
`tags`	string[]	`["high_potential"]`	Topic tags applied during enrichment.

AI Research Dataset

VIABILITY DISTRIBUTION

TOP RESEARCH CLUSTERS

CODE AVAILABILITY

Schema

PIPELINE

VERSIONING + CADENCE

LICENSE + ATTRIBUTION

Public Dataset

Endpoints

Quick start

Example response

Use the public dataset as a machine-readable proof surface