ARXIV:2606.06679 · LEGAL AI · SUBMITTED 08 JUN · 20:17 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

Xi Xuan · Wenxin Zhang · Yufei Zhou · King-kui Sin · Chunyu Kit · arXiv

Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

Ship in 2-4 weeks›Score8.0Evidence unverified

Opportunity summary

Pain Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

Evidence 0 refs | 4 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code. We introduce the Hong Kong Judgment Discourse Dataset (HKJudge), the first sentence-level expert-annotated legal discourse corpus.

METHOD

Full abstract

Court judgments are central to legal practice and jurisprudence, yet discourse analysis of Hong Kong judgments has received limited attention, owing largely to the absence of expert-annotated corpora. We introduce the Hong Kong Judgment Discourse Dataset (HKJudge), the first sentence-level expert-annotated legal discourse corpus. HKJudge includes criminal judgments across all five levels of HK's court hierarchy, comprising $\sim$290k sentences and $\sim$6.5 million tokens, fully annotated by legal linguistics experts. We design a two-tier discourse schema that captures what facts a court finds, how it reasons, and what it rules. At the sentence level, each sentence is assigned one of 26 rhetorical roles. At the span level, sentences are further annotated with three sentencing elements (charge, imprisonment term, fine). Ten legal linguistics annotators produced the annotations with an inter-annotator agreement of $κ= 0.8$. We formulate two tasks on HKJudge, termed rhetorical role classification and legal element extraction, and provide the first benchmark evaluation of four BERT-based models, two open-source LLMs under zero-shot and fine-tuning settings, and four commercial LLMs on both tasks. Our work demonstrates the value of sentence-level discourse annotation for modeling the structure of HK judgments and provides a rich data foundation for future work on legal judgment prediction. The HKJudge dataset and code are available at https://github.com/xuanxixi/HKJudge.

RESULT

ScienceToStartup currently rates this 8.0/10 on the public viability pass. Our work demonstrates the value of sentence-level discourse annotation for modeling the structure of HK judgments and provides a rich data foundation for future…

WHY NOW

Legal AI moved forward this cycle; last verified June 2026. Public score 8.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score8.0

PainIntroducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

Evidence0 refs | 4 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

Segment

Legal AI

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "f2931250-9a10-4ade-9fbc-ce56450f776b", "arxiv_id": "2606.06679", "canonical_route": "/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "endpoints": { "paper_pack": "/api/v1/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule/paper-pack", "build_passport": "/api/v1/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule", "normalized_query": "2606.06679", "route": "/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "paper_ref": "hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule#webpage", "url": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "name": "HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule", "description": "Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule#scholarlyArticle", "headline": "HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule", "description": "Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.", "url": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule", "sameAs": "https://arxiv.org/abs/2606.06679", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.06679" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-04T19:53:12.000Z", "author": [ { "@type": "Person", "name": "Xi Xuan" }, { "@type": "Person", "name": "Wenxin Zhang" }, { "@type": "Person", "name": "Yufei Zhou" }, { "@type": "Person", "name": "King-kui Sin" }, { "@type": "Person", "name": "Chunyu Kit" } ], "codeRepository": "https://github.com/xuanxixi/HKJudge", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 8 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Legal AI" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule#software", "name": "HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule - Source Code", "description": "Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.", "codeRepository": "https://github.com/xuanxixi/HKJudge", "url": "https://github.com/xuanxixi/HKJudge" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Legal AI", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HKJudge: A Legal Discourse-Annotated Corpus for Interpreting", "item": "https://sciencetostartup.com/paper/hkjudge-a-legal-discourse-annotated-corpus-for-interpreting-what-courts-find-how-they-reason-and-what-they-rule" } ] } ] }

Competitive landscape

Introducing HKJudge, a novel expert-annotated legal discourse corpus and benchmark for interpreting court judgments, with available code.

Segment

Legal AI

Adoption evidence

Public code linked for build inspection

Commercial read

8.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline