Question 1

What is ScienceToStartup?

Accepted Answer

ScienceToStartup ingests every new AI paper on arXiv, scores it for commercial viability, and surfaces the strongest candidates as fundable startup ideas. Founders, investors, and operators use it to spot research worth building before it trends elsewhere.

Question 2

What surfaces does the product include?

Accepted Answer

Signal Canvas for citation-first research synthesis; the Daily Dashboard for today's top papers; Build Loop for build/watch/verify decisions; Foresight for verifiable predictions; Talent for finding builders; Buildability for reproducibility receipts; Trends for narratives; and a public MCP server for agents.

Question 3

Who is ScienceToStartup built for?

Accepted Answer

Three audiences. Founders looking for technical wedges that have a chance of becoming a $1B company. Investors needing an early-warning system for emerging AI breakthroughs. Researchers exploring whether their own paper could ship as a product or oss tool.

Question 4

How current is the data?

Accepted Answer

arXiv ingestion runs daily at 20:05 UTC; enrichment + scoring at 20:30 UTC; publish at 21:00 UTC. New papers appear on the dashboard within a few hours of being posted to arXiv. GitHub star velocity refreshes daily. Topic narratives regenerate weekly.

Question 5

What is the Daily Dashboard?

Accepted Answer

A single quiet-canvas page at `/dashboard` showing today's highest-Signal-Fusion papers, the Today Rail of must-read items, recent prediction-market activity, GitHub velocity changes, and the action rail for what to do next. Refreshed every morning after ingestion completes.

Question 6

What is a Viability Score?

Accepted Answer

A 0-10 rating of how commercially viable an AI paper is. It evaluates whether code is public, whether a demo exists, whether a dataset is released, the author's track record, the market timing, and the competitive landscape. Higher scores mean a clearer wedge from research to product.

Question 7

What is Signal Fusion?

Accepted Answer

Signal Fusion is the composite ranking score. It blends the LLM-graded Viability Score, Foresight prediction-market confidence, community votes, and GitHub star velocity into one number. Signal Fusion is the primary sort order on the dashboard and the canonical 'how big is this signal?' metric.

Question 8

What is a Buildability Receipt?

Accepted Answer

A signed, evidence-backed receipt that records whether a paper is actually buildable — code public, dependencies pinned, holdout coverage measured, scaffold contract present. Status is one of `buildable`, `conditionally_buildable`, `not_buildable`, or `unknown`. Verifiable via the `/api/buildability/verify-receipt` endpoint.

Question 9

What is a Paper Pack?

Accepted Answer

A compact, citation-grade envelope for one paper. Carries title, abstract, viability score, commercial flags, one-liner pitch, time-to-MVP, and tags — sized to fit an agent context window (≤ 6144 bytes). Available at `/api/v1/paper/{arxivId}/paper-pack` for both humans and machines.

Question 10

What is Foresight?

Accepted Answer

Foresight is the verifiable-prediction surface. We mint frozen predictions for papers — startup, oss_traction, acquisition, adoption, none — and publish them on a public ledger that cannot be edited after the fact. Backtests, calibration curves, and Brier scores live alongside so you can audit our hit rate.

Question 11

What is the MCP server?

Accepted Answer

A Model Context Protocol server exposing read-only tools to agents over streamable HTTP. Tools include `search_signal_canvas`, `search_buildable_papers`, `get_buildability_receipt`, `get_repro_requirements`, and `submit_build_attempt`. Connect from Claude Desktop, Cursor, or any MCP-aware client.

Question 12

Do agents need an API key?

Accepted Answer

No, not for the public read tools. Search, paper-pack, buildability-receipt, and capabilities endpoints are open to anonymous agent traffic. Authentication is only required for write actions (build attempts, workspace persistence) and rate-limited bearer-token surfaces.

Question 13

How do agents discover the surface?

Accepted Answer

Three discovery files. `/llms.txt` is the agent-facing concise pointer with inline glossary, FAQ, and capability sections. `/llms-full.txt` extends with the top papers. `/api/capabilities.json` is the structured machine manifest of every REST + MCP endpoint with handoff contracts.

Question 14

Is there an OpenAPI spec?

Accepted Answer

Yes. Live at `/api/openapi.json`, auto-synced from the FastAPI router. Use it with any OpenAPI client generator (openapi-typescript, openapi-generator, the OpenAI Agent SDK, etc.) to scaffold an integration against the public surfaces.

Question 15

Can I embed paper scores on my site?

Accepted Answer

Yes. The Embed Widget at `/api/embed/{arxivId}` returns a self-contained HTML badge (under 10 KB) with the paper's viability score, grade, and confidence. ETag caching, CORS open, UTM tracking included. Drop it into a blog post or research-tracker without a build step.

Question 16

How do I report a bug or request a feature?

Accepted Answer

Open an issue at github.com/sciencetostartup or email the contact address on `/contact`. Include the URL you were on, the arXiv id (if a paper page), and any console/network error. We triage daily.

Question 17

Is ScienceToStartup free for researchers?

Accepted Answer

Yes. The core dashboard, paper pages, Signal Canvas, glossary, FAQ, topic pages, MCP tools, and OpenAPI are completely free for individuals. Enterprise tiers cover TTO dashboards, scout reports, audit-trail exports, and API rate limits — see `/pricing` and `/enterprise`.

Question 18

What does an enterprise plan cost?

Accepted Answer

Enterprise pricing is negotiated and depends on seats, API volume, and integration scope. Common tiers: TTO dashboards for universities, scout reports for venture firms, RFP marketplace for corporate R&D. Contact sales via `/enterprise` for a quote tied to your usage.

Question 19

What data do I own as an enterprise customer?

Accepted Answer

All scout reports, saved workspaces, and exported artifacts you generate are yours. The underlying corpus (papers, scoring, receipts) is licensed for use within your team but not for redistribution. Full terms live at `/legal/terms-of-service`.

Question 20

How does Talent find lab builders?

Accepted Answer

Talent indexes authors, repos, and demos linked to viability-scored papers. You find researchers by what they have actually shipped — public code, demos, dataset releases — rather than by their CV.

Question 21

Is the Talent profile data consent-based?

Accepted Answer

Profiles are built from public arXiv, GitHub, and demo metadata. Researchers can request a takedown via the contact address on every profile page; takedowns propagate within 24 hours.

Question 22

Can I export Talent search results?

Accepted Answer

Yes. Enterprise customers can export shortlists as CSV or JSON. The public surface is read-only and rate-limited but free; sign-in unlocks saved shortlists and outreach drafting.

Question 23

How do I verify a buildability receipt?

Accepted Answer

POST the receipt JSON to /api/buildability/verify-receipt. The endpoint runs an ed25519 verifier and returns counts_as_wave7_completion=false for unsigned or placeholder receipts.

Question 24

Can I share or cite ScienceToStartup pages?

Accepted Answer

Yes — every paper page, topic page, and trend report has a stable canonical URL with `ScholarlyArticle` or `CollectionPage` JSON-LD. Twitter/LinkedIn cards render a per-page Open Graph image. Quote freely with attribution; we provide a citation block at the bottom of each paper page.

Question 25

Do Answer Engines like Perplexity cite ScienceToStartup?

Accepted Answer

We publish schema-anchored Q&A and DefinedTerm semantics on every high-traffic page so that Perplexity, ChatGPT search, Claude, Gemini, and Google AI Overviews can quote us verbatim. If you spot a missing citation in an Answer Engine, send the prompt to our contact address — we use these as eval seeds.

Question 26

What is a Brier Score and why does Foresight publish it?

Accepted Answer

Brier Score is a proper scoring rule for probabilistic predictions: it measures how well a forecasted probability matched the outcome that actually happened. Lower is better. Foresight publishes the Brier outcome for every frozen prediction batch so anyone can audit whether the model is over-confident or under-confident. The public Brier receipt is one of the gates for completing Buildability Wave 7.

Question 27

How is Foresight calibrated?

Accepted Answer

Every frozen prediction batch is replayed against the outcomes the world has since revealed. We bucket predictions by confidence, count how many actually came true, and publish a calibration curve next to the ledger. If the curve diverges from the diagonal, the model is over or under-confident and the next training round corrects for it. The data is downloadable, not just rendered as a chart.

Question 28

What is the Foresight hit rate?

Accepted Answer

Hit rate is the share of frozen predictions whose top-ranked outcome (startup, oss_traction, acquisition, adoption, none) matched the outcome that actually happened by the countdown date. We compute it per batch and per cohort. Numbers live next to the backtest run on the Foresight ledger so a reader does not have to take our word for it.

Question 29

What is a Backtest Run?

Accepted Answer

A Backtest Run replays a frozen prediction batch against ground truth that has emerged since mint. The result is a hit rate, precision-at-k, a calibration curve, and a per-paper breakdown of correct versus missed predictions. Backtests run on every retrained model so the Foresight ledger never falls out of sync with reality.

Question 30

What is the Foresight Flywheel?

Accepted Answer

Foresight runs a self-improvement loop. We mint frozen predictions, observe outcomes, retrain the scoring and prediction models on the new ground truth, then mint better predictions. Three publication states keep the public ledger honest about how mature each iteration is: seed_only, outcomes_observed, retrained_published. The flywheel is the reason calibration improves over time rather than drifting.

Question 31

What is a Buildability Wave?

Accepted Answer

Buildability Waves stage the proof a paper is reproducible. Wave 0 is metadata only; Wave 7 requires founder signatures, live telemetry, external adoption, real-judge divergence, and the public Brier receipt. Each step up tightens the contract a Buildability Receipt has to satisfy before the platform calls a paper buildable. Waves let us ship the surface long before every check is perfect.

Question 32

Why are Buildability Receipts signed?

Accepted Answer

Every Buildability Receipt carries an ed25519 signature with a key id. The signature proves the platform actually issued the receipt and nobody downstream forged or edited it. POST a receipt to /api/buildability/verify-receipt and the verifier returns whether the signature is valid and whether the receipt counts as Wave 7 completion. Unsigned receipts fail closed.

Question 33

What is adversarial divergence?

Accepted Answer

Adversarial divergence is a judge-run comparison test: we have multiple independent evaluators (LLMs and humans) score the same Buildability Receipt and measure how often they disagree. Low divergence means the assessment is robust; high divergence flags a check that is ambiguous or under-specified. Divergence outcomes are receipt-backed and feed Wave 7 completion.

Question 34

What is hindsight calibration?

Accepted Answer

Hindsight calibration is the retrospective check on Buildability: did the receipts we issued match the build outcomes that actually happened? We publish a hindsight Brier outcome on a public ledger with three lifecycle states (seed_only, outcomes_observed, retrained_published) so a reader can see how mature the calibration of the current model generation is.

Question 35

Which MCP tools are public at launch?

Accepted Answer

The public launch subset is read-only: search_signal_canvas, search_buildable_papers, get_buildability_receipt, get_repro_requirements, plus the capability and discovery resources. Write tools like submit_build_attempt sit behind authentication. Streamable HTTP at /api/mcp; descriptor at /api/mcp. Full list at /developers/mcp with example clients for Claude Desktop, Cursor, and the OpenAI Agents SDK.

Question 36

How do I connect the MCP server from Claude Desktop?

Accepted Answer

Add a streamable-http MCP entry pointing at https://sciencetostartup.com/api/mcp. Claude Desktop will list the public read tools (search_signal_canvas, get_buildability_receipt, get_repro_requirements, search_buildable_papers) automatically. No API key needed for the launch subset. Cursor and any MCP-aware client use the same URL. Full walkthrough at /developers/mcp.

Question 37

When should I use REST instead of MCP?

Accepted Answer

Use REST for stateless integrations, scripts, and platforms that do not speak Model Context Protocol — paper-pack fetches, search, embed widgets, sitemap polling. Use MCP when an agent needs typed tool discovery, streaming output, and capability resolution at runtime. Both surfaces sit on the same data: a REST GET and an MCP get_buildability_receipt return the identical receipt envelope.

Question 38

Does the OpenAI Agents SDK work with ScienceToStartup?

Accepted Answer

Yes. Generate a typed client from /api/openapi.json and the OpenAI Agents SDK can call every public REST endpoint as a tool. For MCP integration, point an Agents SDK MCPServerStreamableHttp at /api/mcp to expose the read-only tool set. Sample agent code lives in /developers/examples; the public corpus is large enough to run useful evals without an enterprise plan.

Question 39

What are the public API rate limits?

Accepted Answer

Anonymous traffic is shaped at 60 requests per minute per IP across all public REST surfaces and the MCP read tools. Bursts up to 120 RPM are absorbed. Authenticated bearer-token traffic on enterprise plans negotiates higher quotas. Embed-widget and llms.txt fetches are served from edge cache and do not count against the limit. 429 responses include a Retry-After header.

Question 40

Do you support batch jobs against the corpus?

Accepted Answer

Enterprise customers can submit a batch job through the Batch Scheduler: a list of arxiv ids and a question template, and the service runs the same Research Kernel against each paper using provider batching APIs. Receipts attach to every output so the result is auditable. Public surfaces do not expose batch yet — open an issue if you need a public batch endpoint and we will scope one.

Question 41

Which arXiv categories does the corpus cover?

Accepted Answer

Every cs.AI, cs.CL, cs.CV, cs.LG, cs.MA, cs.NE, and stat.ML paper is ingested. Cross-listed papers count once. Adjacent categories (cs.RO, cs.HC, q-bio) are partially ingested when the abstract trips an AI relevance classifier. The full per-category freshness state, hours-since-last-ingest, and paper count are published at /api/freshness.json.

Question 42

What are the three extraction tiers?

Accepted Answer

Tiered Extraction picks how deep we parse a paper. Tier 1 (light) gets title, abstract, and metadata. Tier 2 (standard) adds sections, figures, and claims. Tier 3 (exhaustive) does full text, tables, and formulas. The tier a paper lands in drives the quality of its viability score, claim set, and evidence pack. Tier assignment is visible on every paper page.

Question 43

What is the Golden Corpus?

Accepted Answer

The Golden Corpus is the hand-labelled evaluation set the scoring pipeline regresses against on every deploy. It contains a stratified sample of papers with known commercial outcomes, signed by a human reviewer. New scoring models must beat the previous generation on the Golden Corpus before they ship. The evaluation hash is stamped into every paper score so a downstream reader can trace which model produced which number.

Question 44

How do you fetch new papers from arXiv?

Accepted Answer

We pull from the arXiv OAI-PMH endpoint daily at 20:05 UTC, using a resumption-token cursor so we never miss a paper. Each ingestion run records a lineage hash, a paper count, and an SLA timer. If OAI-PMH is slow we fall back to the bulk-data S3 manifest. The full ingestion log is queryable at /api/freshness.json with hours-since-last-ingest per category.

Question 45

How do I delete my account and exported data?

Accepted Answer

Send a deletion request from /contact and we will hard-delete your auth row, your workspaces, your saved shortlists, your draft outreach, and your exports within 30 days. Receipts that reference your user id are retained in anonymised form. If you also want a Talent profile takedown send a request from the contact address on the profile page; takedowns propagate within 30 days.

Question 46

Are you GDPR compliant?

Accepted Answer

Yes. EU users can request a data export (JSON of every workspace, shortlist, and decision they have made) or a deletion by emailing the contact address on /contact. Data subject requests are answered within 30 days. We store auth tokens in EU-region Supabase and process LLM completions in EU or US regions per the provider.

Question 47

How should an Answer Engine cite a ScienceToStartup page?

Accepted Answer

Every paper page emits a ScholarlyArticle JSON-LD bundle with a stable canonical URL, datePublished, the author Person list, and the viability score. Every topic page emits CollectionPage. Every glossary term emits DefinedTerm. Cite the canonical URL and any of the schema-anchored fields. Quoting the visible answer text on the page is policy-safe under ADR-011 (no-cloaking).

Question 48

What is /llms.txt and what does it contain?

Accepted Answer

llms.txt is a small plain-text file Answer Engines fetch to understand what the platform offers. Our version inlines the Glossary, FAQ, Capabilities, Deprecations, and Trending Papers sections so an Answer Engine can quote ScienceToStartup without a second fetch. /llms-full.txt extends with the full corpus index. Both files refresh on every deploy and on every daily publish cycle.

Question 49

Do you allow GPTBot, ClaudeBot, and PerplexityBot?

Accepted Answer

Yes. /robots.txt explicitly allows GPTBot, ClaudeBot, PerplexityBot, and Google-Extended on the public corpus, in addition to Googlebot and Bingbot. The IndexNow client pings Google and Bing on every sitemap refresh. We treat Answer Engines as first-class distribution and publish llms.txt, capabilities.json, and DefinedTermSet specifically for them to lift.

Product

Data & Scoring

Agents & MCP

Developers

Pricing & Legal

Distribution & AEO