ARXIV:2602.23234 · SEARCH AND RECOMMENDATION OPTIMIZATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

arXiv

Enhance app store relevance with LLM-generated textual judgments for improved search ranking.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain Enhance app store relevance with LLM-generated textual judgments for improved search ranking.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Enhance app store relevance with LLM-generated textual judgments for improved search ranking. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's…

METHOD

Full abstract

Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a significant outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance while simultaneously increasing for textual relevance. These offline gains were validated by a worldwide A/B test on the App Store ranker, which demonstrated a statistically significant +0.24% increase in conversion rate, with the most substantial performance gains occurring in tail queries, where the new textual relevance labels provide a robust signal in the absence of reliable behavioral relevance labels.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit…

WHY NOW

Search and Recommendation Optimization moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainEnhance app store relevance with LLM-generated textual judgments for improved search ranking.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Enhance app store relevance with LLM-generated textual judgments for improved search ranking.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Enhance app store relevance with LLM-generated textual judgments for improved search ranking.

Segment

Search and Recommendation Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(17)

Modeling Ranking Properties with In-Context Learning

2025Nilanjan Sinhababu, Andrew Parry et al.

A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao

2025Yue Meng, Cheng Guo et al.

Benchmarking LLM-based Relevance Judgment Methods

2025Negar Arabzadeh, Charles L. A. Clarke

RRADistill: Distilling LLMs’ Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine

2024Nayoung Choi, Youngjune Lee et al.

Permutative Preference Alignment from Listwise Ranking of Human Judgments

2024Yang Zhao, Yixin Wang et al.

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

2024Qi Liu, Atul Singh et al.

Multi-objective Learning to Rank by Model Distillation

2024Jie Tang, Huiji Gao et al.

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

2024Yiqun Chen, Qi Liu et al.

Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

2023Honglei Zhuang, Zhen Qin et al.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

2023Shengyao Zhuang, Honglei Zhuang et al.

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

2023Ronak Pradeep, Sahel Sharifymoghaddam et al.

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

2023Zhen Qin, R. Jagerman et al.

Can Large Language Models Be an Alternative to Human Evaluations?

2023Cheng-Han Chiang, Hung-yi Lee

Perspectives on Large Language Models for Relevance Judgment

2023G. Faggioli, Laura Dietz et al.

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

2023Yang Liu, Dan Iter et al.

One-Shot Labeling for Automatic Relevance Estimation

2023Sean MacAvaney, Luca Soldaini

Cumulated gain-based evaluation of IR techniques

2002K. Järvelin, Jaana Kekäläinen

{ "contract_version": "paper-r2", "paper_id": "e45efad6-cea6-4fd2-9ab9-9853e1edd916", "arxiv_id": "2602.23234", "canonical_route": "/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "endpoints": { "paper_pack": "/api/v1/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments/paper-pack", "build_passport": "/api/v1/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments", "normalized_query": "2602.23234", "route": "/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "paper_ref": "scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments#webpage", "url": "https://sciencetostartup.com/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "name": "Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments", "description": "Enhance app store relevance with LLM-generated textual judgments for improved search ranking.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments#scholarlyArticle", "headline": "Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments", "description": "Enhance app store relevance with LLM-generated textual judgments for improved search ranking.", "url": "https://sciencetostartup.com/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments", "sameAs": "https://arxiv.org/abs/2602.23234", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.23234" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-26T17:11:26.000Z", "author": [ { "@type": "Person", "name": "Evangelia Christakopoulou", "affiliation": { "@type": "Organization", "name": "Apple" } }, { "@type": "Person", "name": "Vivekkumar Patel", "affiliation": { "@type": "Organization", "name": "Apple" } }, { "@type": "Person", "name": "Hemanth Velaga", "affiliation": { "@type": "Organization", "name": "Apple" } }, { "@type": "Person", "name": "Sandip Gaikwad", "affiliation": { "@type": "Organization", "name": "Apple" } } ], "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "474b59490cdb43db2897d18e2b2953d8932b4cb7" }, "url": "https://www.semanticscholar.org/paper/474b59490cdb43db2897d18e2b2953d8932b4cb7" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "386dd3c755c55ad8c507337b5b8dfcc76ee51cc9" }, "url": "https://www.semanticscholar.org/paper/386dd3c755c55ad8c507337b5b8dfcc76ee51cc9" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "d245095e76711783437c4181c3246d111c7673ff" }, "url": "https://www.semanticscholar.org/paper/d245095e76711783437c4181c3246d111c7673ff" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "d0f59de3b16cc20027f8b3462f22c8b3ac61d91b" }, "url": "https://www.semanticscholar.org/paper/d0f59de3b16cc20027f8b3462f22c8b3ac61d91b" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "528e86eb77bd1451254fc06e9a3510aae75be3d3" }, "url": "https://www.semanticscholar.org/paper/528e86eb77bd1451254fc06e9a3510aae75be3d3" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "397793302b58f33ea2821fb9155a38ad1b6a9bf8" }, "url": "https://www.semanticscholar.org/paper/397793302b58f33ea2821fb9155a38ad1b6a9bf8" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ef4668e3b34dcbb9af51a9cf39c831e8aeb14427" }, "url": "https://www.semanticscholar.org/paper/ef4668e3b34dcbb9af51a9cf39c831e8aeb14427" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "0d25d0fadeb9091ea8c8e054c699d852eaf80374" }, "url": "https://www.semanticscholar.org/paper/0d25d0fadeb9091ea8c8e054c699d852eaf80374" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "da9b8b4073e6ad44b3da66e1e117cb1ddbf8836d" }, "url": "https://www.semanticscholar.org/paper/da9b8b4073e6ad44b3da66e1e117cb1ddbf8836d" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "5f0a6fbad054590e45bc568f1ee33f1d632e03de" }, "url": "https://www.semanticscholar.org/paper/5f0a6fbad054590e45bc568f1ee33f1d632e03de" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "ba03ca8faa9f01cd9d26b80f08d421376f70de22" }, "url": "https://www.semanticscholar.org/paper/ba03ca8faa9f01cd9d26b80f08d421376f70de22" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "2d3bc530d8f1ed36932a70bc362ea94d988adec9" }, "url": "https://www.semanticscholar.org/paper/2d3bc530d8f1ed36932a70bc362ea94d988adec9" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "03055978e278960de9fbb5c648b1779ef9f26cd1" }, "url": "https://www.semanticscholar.org/paper/03055978e278960de9fbb5c648b1779ef9f26cd1" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "c8b271d6bf6f3906edb012c62a5ba1193e9c74ae" }, "url": "https://www.semanticscholar.org/paper/c8b271d6bf6f3906edb012c62a5ba1193e9c74ae" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "381ab7a640f5b46b62f7e08d1af4a8e0d3eadd55" }, "url": "https://www.semanticscholar.org/paper/381ab7a640f5b46b62f7e08d1af4a8e0d3eadd55" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "352bcafbcc95a84d96019688955cab5c43eb23f0" }, "url": "https://www.semanticscholar.org/paper/352bcafbcc95a84d96019688955cab5c43eb23f0" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "8490234d79b47e459824dcf87c1e288211a3c964" }, "url": "https://www.semanticscholar.org/paper/8490234d79b47e459824dcf87c1e288211a3c964" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Search and Recommendation Optimization" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Search and Recommendation Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Scaling Search Relevance: Augmenting App Store Ranking with ", "item": "https://sciencetostartup.com/paper/scaling-search-relevance-augmenting-app-store-ranking-with-llm-generated-judgments" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"Scaling Search Relevance: Augmenting App Store Ranking with \"?", "acceptedAnswer": { "@type": "Answer", "text": "Enhance app store relevance with LLM-generated textual judgments for improved search ranking." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Develop an API that offers LLM-generated relevance labels for digital content platforms, allowing easy integration to enhance the ranking capabilities of search engines." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Commercialize this as a B2B service for digital marketplaces to enhance their search rankings using LLM-generated relevance labels, thus increasing conversions and user satisfaction." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "It replaces the traditional human-dependent relevance labeling process with an automated, scalable solution that reduces operational costs and enhances performance metrics." } } ] } ] }

Competitive landscape

Enhance app store relevance with LLM-generated textual judgments for improved search ranking.

Segment

Search and Recommendation Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

References(17)

Modeling Ranking Properties with In-Context Learning

2025Nilanjan Sinhababu, Andrew Parry et al.

A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao

2025Yue Meng, Cheng Guo et al.

Benchmarking LLM-based Relevance Judgment Methods

2025Negar Arabzadeh, Charles L. A. Clarke

RRADistill: Distilling LLMs’ Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine

2024Nayoung Choi, Youngjune Lee et al.

Permutative Preference Alignment from Listwise Ranking of Human Judgments

2024Yang Zhao, Yixin Wang et al.

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

2024Qi Liu, Atul Singh et al.

Multi-objective Learning to Rank by Model Distillation

2024Jie Tang, Huiji Gao et al.

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

2024Yiqun Chen, Qi Liu et al.

Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

2023Honglei Zhuang, Zhen Qin et al.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

2023Shengyao Zhuang, Honglei Zhuang et al.

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

2023Ronak Pradeep, Sahel Sharifymoghaddam et al.

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

2023Zhen Qin, R. Jagerman et al.

Can Large Language Models Be an Alternative to Human Evaluations?

2023Cheng-Han Chiang, Hung-yi Lee

Perspectives on Large Language Models for Relevance Judgment

2023G. Faggioli, Laura Dietz et al.

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

2023Yang Liu, Dan Iter et al.

One-Shot Labeling for Automatic Relevance Estimation

2023Sean MacAvaney, Luca Soldaini

Cumulated gain-based evaluation of IR techniques

2002K. Järvelin, Jaana Kekäläinen

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(17)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

References(17)

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline