Reference Asset
Open artifact of 1,000 production rows with 12 fields, immutable JSON/CSV/schema receipts, and API parity. Freshness window ended 2026-04-23T17:29:57.989Z. CC BY 4.0.
PREVIEW · 1,000 ROWS
| # | arxiv_id | Title | Score | Cluster | Code | Tags |
|---|---|---|---|---|---|---|
| 1 | 2604.19740v1 | Generalization at the Edge of Stability This research theoretically explores the 'sharpness dimension' to understand generalization in large learning rate neural network training, offering insights into chaotic optimization regimes. | 3 | LLM Training | high_potential | |
| 2 | 2604.19734v1 | UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning a… UniT is a framework that bridges the gap between human and humanoid robot learning by creating a unified physical language for policy learning and world modeling. | 7 | Robotics | series_a_plushigh_potential | |
| 3 | 2604.19730v1 | FASTER: Value-Guided Sampling for Fast RL Develop a reinforcement learning tool that leverages value-guided sampling for improved efficiency and scalability. | 7 | Reinforcement Learning | quick_buildhigh_potential | |
| 4 | 2604.19728v1 | VLA Foundry: A Unified Framework for Training Vision-Language-Action Models VLA Foundry: A unified framework for training vision-language-action models in robotics. | 8 | AI Frameworks | quick_buildhigh_potential | |
| 5 | 2604.19724v1 | Benign Overfitting in Adversarial Training for Vision Transformers Theoretical analysis of adversarial training for Vision Transformers reveals conditions for benign overfitting, improving robustness. | 3 | Computer Vision |
Showing 5 of 1,000 rows. Full export via API or download.
| Column | Type | Example | Description |
|---|---|---|---|
arxiv_id | string | 2604.19740v1 | Canonical arXiv identifier; primary key. |
title | string | Generalization at the Edge of Stability | Paper title as published. |
abstract | string | Training modern neural networks… | Original abstract text. |
published_date | string (ISO 8601) | 2026-04-21T17:59:02+00:00 | Original publication date on arXiv. |
viability_score | number | null | 3 | Composite commercial viability rank. |
cluster_label | string | LLM Training | Research field assigned during clustering. |
has_code | boolean | true | True when an external repository URL is attached. |
repo_url | string | null | https://github.com/owner/repo | URL of the linked code repository, if any. |
commercial_flags | string[] | ["has_code","high_potential"] | Signal flags such as has_code or high_potential. |
one_liner | string | This research theoretically explores the 'sharpness dimensio… | Short, human-readable summary of the paper. |
time_to_mvp | string | 6+ months | Coarse estimate of time required to ship an MVP. |
tags | string[] | ["high_potential"] | Topic tags applied during enrichment. |
arXiv ingest
daily
Dedupe
near-duplicate authors + abstract
Score
viability composite
Snapshot
immutable artifact
dataset-public-v3dataset_export_v3Licensed under CC BY 4.0. Attribute as:
ScienceToStartup — AI Research Dataset, artifact public-dataset-2026-04-22T17-29-57-989Z. https://sciencetostartup.com/resources/dataset
Agent Handoff
Canonical ID dataset | Route /resources/dataset
REST example
curl https://sciencetostartup.com/api/v1/agent-handoff/dataset/datasetMCP example
{
"tool": "search_papers",
"arguments": {
"query": "dataset export"
}
}source_context
{
"surface": "dataset",
"mode": "resource",
"query": "public dataset",
"normalized_query": "dataset",
"route": "/resources/dataset",
"paper_ref": null,
"topic_slug": null,
"benchmark_ref": null,
"dataset_ref": "dataset"
}/api/v1/resources/datasetReturns the artifact manifest: schema, freshness, immutable URLs.
/api/v1/resources/dataset/export?format=jsonStreams every row as JSON. Array fields stay arrays.
/api/v1/resources/dataset/export?format=csvFlat CSV export. Array fields flatten to semicolon-delimited strings.
curl -s https://sciencetostartup.com/api/v1/resources/dataset/export?format=json | jq '.data[0]'{
"data": [
{
"arxiv_id": "2604.19740v1",
"title": "Generalization at the Edge of Stability",
"viability_score": 3,
"cluster_label": "LLM Training",
"has_code": true,
"one_liner": "This research theoretically explores the 'sharpness dimension' to understand generalization in large learning rate neural network training, offering insights into chaotic optimization regimes.",
"tags": [
"high_potential"
]
}
],
"meta": {
"count": 1,
"source_count": 1000,
"artifact_id": "public-dataset:2026-04-22T17-29-57-989Z",
"schema_version": "dataset-public-v3",
"exported_at": "2026-04-22T17:29:57.989Z"
}
}Use This Via API or MCP
Pull the dataset through REST, reference it from llms.txt, or use it as the stable evidence layer behind agent workflows that need paper metadata, scores, and exports.