ARXIV:2601.19197 · RECOMMENDER SYSTEMS EVALUATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

arXiv

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that…

METHOD

Full abstract

The integration of Large Language Models (LLMs) into recommendation systems has introduced unprecedented capabilities for natural language understanding, explanation generation, and conversational interactions. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that determine the real-world user experience. We introduce \framework{} (\textbf{H}uman-centered \textbf{E}valuation for \textbf{L}LM-powered reco\textbf{M}menders), a comprehensive evaluation framework that systematically assesses LLM-powered recommender systems across five human-centered dimensions: \textit{Intent Alignment}, \textit{Explanation Quality}, \textit{Interaction Naturalness}, \textit{Trust \& Transparency}, and \textit{Fairness \& Diversity}. Through extensive experiments involving three state-of-the-art LLM-based recommenders (GPT-4, LLaMA-3.1, and P5) across three domains (movies, books, and restaurants), and rigorous evaluation by 12 domain experts using 847 recommendation scenarios, we demonstrate that \framework{} reveals critical quality dimensions invisible to traditional metrics. Our results show that while GPT-4 achieves superior explanation quality (4.21/5.0) and interaction naturalness (4.35/5.0), it exhibits a significant popularity bias (Gini coefficient 0.73) compared to traditional collaborative filtering (0.58). We release \framework{} as an open-source toolkit to advance human-centered evaluation practices in the recommender systems community.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Through extensive experiments involving three state-of-the-art LLM-based recommenders (GPT-4, LLaMA-3.1, and P5) across three domains (movies, books, and restaurants), and rigorous evaluation by 12…

WHY NOW

Recommender Systems Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainHELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

ARXIV:2601.19197 · RECOMMENDER SYSTEMS EVALUATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

arXiv

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

RESULT

WHY NOW

Recommender Systems Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainHELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Paper Pack

10.48550/arXiv.2601.19197

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Abstract

Source availability

PDF linked

The paper record includes a public PDF URL.

Extraction status

Parse run linked

A document parse run is attached to this paper.

Proof status

unverified

0 refs; 0 sources; 17% coverage.

What was readable

linkedon file6 anchorsderived fallback22 indexednot indexed

Derived fallback: Estimated from adjacent evidence; not verified from source.

Viability

7.0

Time to MVP

MVP estimate missing

Commercial

No commercial flags on file

Export

Preparing verified analysis

lens / founder

PROBLEM

METHOD

RESULT

WHY NOW

Recommender Systems Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10.

Claim map

Abstract-backed public claims while anchored extraction refreshes.

Strong 0Mixed 0Weak 4

Evidencepartial
HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that determine the real-world user experience.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
The integration of Large Language Models (LLMs) into recommendation systems has introduced unprecedented capabilities for natural language understanding, explanation generation, and conversational interactions. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that determine the real-world user experience.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
ScienceToStartup currently rates this 7.0/10 on the public viability pass. Through extensive experiments involving three state-of-the-art LLM-based recommenders (GPT-4, LLaMA-3.1, and P5) across three domains (movies, books, and restaurants), and rigorous evaluation by 12 domain experts using 847 recommendation scenarios, we demonstrate that \framework{} reveals critical quality dimensions invisible to traditional metrics.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial
Evidencepartial
Recommender Systems Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10.
Implicationpartial
Abstract-backed fallback claim; anchored extraction has not materialized a public claim row yet.
Verificationpartial
partial

Constellation map

Paper-native neighborhood for concepts, methods, materials, markets, and competitors. Missing lanes stay labeled instead of disappearing behind commercialization gates.

Open full Signal Canvas

Concepts

not indexed

Methods

Materials

PDF linkedDocument parse run

Markets

Recommender Systems Evaluation

Competitors

not indexed

Competitive landscape

HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.

Segment

Recommender Systems Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Buzz

No indexed public discussion is attached to 2601.19197 yet. That is a visibility signal, not a blank module: the monitor is watching the public channels below.

Hacker News

Not indexed yet

Bluesky

Not indexed yet

PDF

Preview the source document here, or use the hero PDF action for a new tab.

References(22)

Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting

2025Alexandre Andre, Gauthier Roy et al.

Large Language Model Enhanced Recommender Systems: A Survey

2024Qidong Liu, Xiangyu Zhao et al.

On explaining recommendations with Large Language Models: a review

2024Alan Said

LLMRec: Large Language Models with Graph Augmentation for Recommendation

2023Wei Wei, Xubin Ren et al.

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

2023Dario Di Palma, Giovanni Maria Biancofiore et al.

A survey on large language models for recommendation

2023Likang Wu, Zhilan Zheng et al.

Large Language Models are Zero-Shot Rankers for Recommender Systems

2023Yupeng Hou, Junjie Zhang et al.

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

2023Keqin Bao, Jizhi Zhang et al.

Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System

2023Yunfan Gao, Tao Sheng et al.

Explaining Recommendations through Conversations: Dialog Model and the Effects of Interface Type and Degree of Interactivity

2023Diana C. Hernandez-Bocanegra, J. Ziegler

Evaluating Recommender Systems: Survey and Framework

2022Eva Zangerle, Christine Bauer

A Survey on Trustworthy Recommender Systems

2022Yingqiang Ge, Shuchang Liu et al.

User Trust in Recommendation Systems: A comparison of Content-Based, Collaborative and Demographic Filtering

2022Mengqi Liao, S. Sundar et al.

“Technique for the Measurement of Attitudes, A”

2022R. Likert

Human-Centered Recommender Systems: Origins, Advances, Challenges, and Opportunities

2021J. Konstan, L. Terveen

A Survey on Conversational Recommender Systems

2020D. Jannach, A. Manzoor et al.

Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects

2019Jianmo Ni, Jiacheng Li et al.

A user-centric evaluation framework for recommender systems

2011P. Pu, Li Chen et al.

Computing Krippendorff's Alpha-Reliability

2011K. Krippendorff

Explaining Recommendations

2007N. Tintarev

Showing 20 of 22 references

CITED BY

No citing papers are indexed in the public S2S graph yet. This is an explicit zero-signal state, not a hidden lookup.

Foundation

Prior WorkLightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters

7.0

Extension

Builds On ThisToward User Preference Alignment in LLM Recommendation via Explicit Context Feedback

6.0

Builds On ThisUnderstanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment

3.0

Builds On ThisLLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation

6.0

Builds On ThisRe-Centering Humans in LLM Personalization

3.0

Builds On ThisLLM-Assisted Reranking to Operationalize Nuanced Objectives in Recommender Systems

5.0

Builds On ThisRRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation

4.0

Builds On ThisAssessing the Quality of Mental Health Support in LLM Responses through Multi-Attribute Human Evaluation

5.0

Builds On ThisPRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

4.0

Commercially relevant

Higher ViabilityReFORM: Review-aggregated Profile Generation via LLM with Multi-Factor Attention for Restaurant Recommendation

8.0

Conflicting

none indexed

Owned Distribution

Subscribe to the weekly brief

Get the weekly shortlist of commercializable papers, benchmark movers, and proof receipts that matter for product execution.

Agent drawer

5 surfaces preserved for agents. Humans can ignore.

Developer contracts, payload previews, evidence maps, and run controls stay here instead of the Read, Build, and Track workspace.

Run context

Paper: 2601.19197
Route: /paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems
Active tab: read
Artifact: helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

Available agents

Read extractor
Build planner
Track monitor
Competitive mapper
Related-paper scout

API/MCP endpoints

REST paper pack API/api/v1/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems/paper-pack
REST build passport API/api/v1/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems/build-passport
REST OpenAPI/api/openapi.json
MCP descriptor/api/mcp
MCP resourcesciencetostartup://surfaces/paper-workspace

Tool contracts

paper_packbuild_passportopportunity_kernelforesightsource_proofevidence_state

Payload preview

Inspect payload

{
  "contract_version": "paper-r2",
  "paper_id": "55d0695d-0af6-47a0-b8ac-22b8a902d2dd",
  "arxiv_id": "2601.19197",
  "canonical_route": "/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
  "active_tab": "synced from current hash by the drawer client",
  "selected_artifact": "helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
  "endpoints": {
    "paper_pack": "/api/v1/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems/paper-pack",
    "build_passport": "/api/v1/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems/build-passport",
    "mcp_resource": "sciencetostartup://surfaces/paper-workspace"
  }
}

Schema validation

paper-r2 contract: present
JSON-LD twin: SSR emitted
OpenAPI path parity: /api/openapi.json
MCP resource parity: paper-workspace

Job trace

queued: drawer opened by user action
running: inspect or copy payload
succeeded: payload available in SSR
failed: route errors appear in evidence cards

Evidence map

sources used: page freshness, source proof anchors, JSON-LD
missing sources: exposed by PaperPack and EvidenceState chips
derived fallbacks: marked unverified before handoff

Page Freshness

Canonical route, proof status, last verified, refs, sources, and coverage.

Page Freshness

Paper proof surface

Canonical route: /paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-04-02
Score updated: 2026-04-02
Score fresh until: 2026-05-02
References: 0
Source count: 0
Coverage: 17%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

OpenAlex: pending — this preprint is not yet indexed by OpenAlex.

Agent Handoff

Endpoint list, payload shape, route context, and copyable handoff data.

Agent Handoff

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

Canonical ID helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems | Route /paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2601.19197"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems",
  "normalized_query": "2601.19197",
  "route": "/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
  "paper_ref": "helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Buildability Receipt

Verdict, compute envelope, blockers, signature state, and receipt links.

Paper proof page receipt window

Watch and verify: HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

/buildability/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

Watchwatch

Subject: HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

Verdict

Watch

Verdict is Watch because viability or proof quality is intermediate and should be re-evaluated before execution.

Time to first demo

Insufficient data

No first-demo timestamp, owner estimate, or elapsed demo receipt is attached to this surface.

Compute envelope

Structured compute envelope

Insufficient data

No data, compute, hardware, memory, latency, dependency, or serving requirement receipt is attached.

Evidence ids

Receipt path

/buildability/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

Paper ref

helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems

arXiv id

2601.19197

Freshness

Generated at

2026-04-02T02:30:40.136Z

Evidence freshness

stale

Last verification

2026-04-02T02:30:40.136Z

Sources

References

Coverage

17%

Hash state

Lineage hash

21ad950ceabc2433b6b279058fe763e16860103e1d7c155ca2a0ad53523d8671

Canonical opportunity-kernel lineage hash.

Signature state

External signature

unsigned_external

No founder, registry, pilot, or production-adoption signature is attached to this receipt.

Verification

not_verified

Verification is blocked until an external signature is provided.

Blockers

Missing: repo_url
Missing: references
Missing: proof_status
Missing: distribution_readiness_scores
Missing: paper_extraction_scorecards
Unknown: distribution readiness has not been computed yet
Unknown: proof verification has not been recorded yet

Verification pending / evidence receipt incomplete

repo_url

references

Missing proof, requirement, signature, approval, adoption, or telemetry fields are blockers and must not be inferred.

Open receipt API receipt Build Loop Signal Canvas Proof divergence Divergence API Brier outcomes API

Source Proof anchors

Visual citations from the paper document graph.

Source proof

Visual citation anchors from the paper document graph.

6 anchors

proof blockPage 370%

This equation captures one of the core mathematical components of the system. 𝑑=1 𝑆𝑑 HCS =

Page and bbox are available; crop image is pending.

proof blockPage 682%

This equation captures one of the core mathematical components of the system. P5 achieves better fairness metrics (Gini = 0.61, Coverage = 24.2%)

Page and bbox are available; crop image is pending.

proof blockPage 882%

This equation captures one of the core mathematical components of the system. Consistency = ∑︁ 𝑝∈𝑃 cos(rorig, r𝑝)

Page and bbox are available; crop image is pending.

JSON-LD twin

The application/ld+json payload rendered for agents.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebPage",
      "@id": "https://sciencetostartup.com/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems#webpage",
      "url": "https://sciencetostartup.com/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
      "name": "HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems",
      "description": "HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.",
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      }
    },
    {
      "@type": "ScholarlyArticle",
      "@id": "https://sciencetostartup.com/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems#scholarlyArticle",
      "headline": "HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems",
      "description": "HELM is an open-source toolkit for human-centered evaluation of LLM-powered recommender systems in real-world user experiences.",
      "url": "https://sciencetostartup.com/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems",
      "sameAs": "https://arxiv.org/abs/2601.19197",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "arXiv",
        "value": "2601.19197"
      },
      "isAccessibleForFree": true,
      "isPartOf": {
        "@id": "https://sciencetostartup.com/#website"
      },
      "datePublished": "2026-01-27T04:53:48.000Z",
      "citation": [
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "02d5d70393ba7a46ba1e332e485b8b71c6f36c75"
          },
          "url": "https://www.semanticscholar.org/paper/02d5d70393ba7a46ba1e332e485b8b71c6f36c75"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "8bdbde264fb44104769e22f246a31de7f3b5237d"
          },
          "url": "https://www.semanticscholar.org/paper/8bdbde264fb44104769e22f246a31de7f3b5237d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "053ad05ee2e4cfdbc13763843a92315dab4d5ec5"
          },
          "url": "https://www.semanticscholar.org/paper/053ad05ee2e4cfdbc13763843a92315dab4d5ec5"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "5aa3b1009955ce2c8f896e0d5e94e06155ef1e43"
          },
          "url": "https://www.semanticscholar.org/paper/5aa3b1009955ce2c8f896e0d5e94e06155ef1e43"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "f7d3c17ad1dee97377651e0f5646b3fc6d047fc0"
          },
          "url": "https://www.semanticscholar.org/paper/f7d3c17ad1dee97377651e0f5646b3fc6d047fc0"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "b486982fa7c68a8a08df1111ba9607119419c488"
          },
          "url": "https://www.semanticscholar.org/paper/b486982fa7c68a8a08df1111ba9607119419c488"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "f4e723958a93762befb4d4a039b44a7d752f9917"
          },
          "url": "https://www.semanticscholar.org/paper/f4e723958a93762befb4d4a039b44a7d752f9917"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "3487c12512fa41d3a4d64f00cb842525a8590ad3"
          },
          "url": "https://www.semanticscholar.org/paper/3487c12512fa41d3a4d64f00cb842525a8590ad3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "0cfdd655100055f234fd23ebecd915504b8e00e3"
          },
          "url": "https://www.semanticscholar.org/paper/0cfdd655100055f234fd23ebecd915504b8e00e3"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "bd9bebd07e6d649b8ddd1019b78f56536d5773ce"
          },
          "url": "https://www.semanticscholar.org/paper/bd9bebd07e6d649b8ddd1019b78f56536d5773ce"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "ac2ae4d79546b6164a1612ed6d4f31948352df99"
          },
          "url": "https://www.semanticscholar.org/paper/ac2ae4d79546b6164a1612ed6d4f31948352df99"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "b20c87e2925a81f57d9c913423e9e74c14de5341"
          },
          "url": "https://www.semanticscholar.org/paper/b20c87e2925a81f57d9c913423e9e74c14de5341"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "6c57c9fe44b043673d416889147e76054a379c7f"
          },
          "url": "https://www.semanticscholar.org/paper/6c57c9fe44b043673d416889147e76054a379c7f"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "1059ea7dd0f9da54abacf55eb95c695facf49662"
          },
          "url": "https://www.semanticscholar.org/paper/1059ea7dd0f9da54abacf55eb95c695facf49662"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "1eb747ce0431f6f9c3a97fcea6f7b235191c3813"
          },
          "url": "https://www.semanticscholar.org/paper/1eb747ce0431f6f9c3a97fcea6f7b235191c3813"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "41d49ec6f73ab5621ab8e8cb5ddb677a886ccc76"
          },
          "url": "https://www.semanticscholar.org/paper/41d49ec6f73ab5621ab8e8cb5ddb677a886ccc76"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "a96898490180c86b63eee2e801de6e25de5aa71d"
          },
          "url": "https://www.semanticscholar.org/paper/a96898490180c86b63eee2e801de6e25de5aa71d"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "24e2b45304c119fe15539671363b3ba6c57f3580"
          },
          "url": "https://www.semanticscholar.org/paper/24e2b45304c119fe15539671363b3ba6c57f3580"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "e84e9f82f49eb5a21cdf2306e42478773ff9e82a"
          },
          "url": "https://www.semanticscholar.org/paper/e84e9f82f49eb5a21cdf2306e42478773ff9e82a"
        },
        {
          "@type": "ScholarlyArticle",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "SemanticScholar",
            "value": "7c363f962654392d67a3323cfeba4ae9cf1dec32"
          },
          "url": "https://www.semanticscholar.org/paper/7c363f962654392d67a3323cfeba4ae9cf1dec32"
        }
      ],
      "additionalProperty": [
        {
          "@type": "PropertyValue",
          "propertyID": "viabilityScore",
          "value": 7
        },
        {
          "@type": "PropertyValue",
          "propertyID": "researchDomain",
          "value": "Recommender Systems Evaluation"
        }
      ]
    },
    {
      "@type": "BreadcrumbList",
      "itemListElement": [
        {
          "@type": "ListItem",
          "position": 1,
          "name": "Home",
          "item": "https://sciencetostartup.com"
        },
        {
          "@type": "ListItem",
          "position": 2,
          "name": "Recommender Systems Evaluation",
          "item": "https://sciencetostartup.com/topics"
        },
        {
          "@type": "ListItem",
          "position": 3,
          "name": "HELM: A Human-Centered Evaluation Framework for LLM-Powered ",
          "item": "https://sciencetostartup.com/paper/helm-a-human-centered-evaluation-framework-for-llm-powered-recommender-systems"
        }
      ]
    }
  ]
}

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(22)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(22)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline