ARXIV:2603.25168 · SCENE TEXT ANALYSIS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis

Xike Zhang · Maoyuan Ye · Juhua Liu · Bo Du · arXiv

ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization. However, the typical reliance on pixel-level text segmentation for sampling thousands of foreground points…

METHOD

Full abstract

Previous works based on Segment Anything Model (SAM) have achieved promising performance in unified scene text detection and layout analysis. However, the typical reliance on pixel-level text segmentation for sampling thousands of foreground points as prompts leads to unsatisfied inference latency and limited data utilization. To address above issues, we propose ET-SAM, an Efficient framework with two decoders for unified scene Text detection and layout analysis based on SAM. Technically, we customize a lightweight point decoder that produces word heatmaps for achieving a few foreground points, thereby eliminating excessive point prompts and accelerating inference. Without the dependence on pixel-level segmentation, we further design a joint training strategy to leverage existing data with heterogeneous text-level annotations. Specifically, the datasets with multi-level, word-level only, and line-level only annotations are combined in parallel as a unified training set. For these datasets, we introduce three corresponding sets of learnable task prompts in both the point decoder and hierarchical mask decoder to mitigate discrepancies across datasets.Extensive experiments demonstrate that, compared to the previous SAM-based architecture, ET-SAM achieves about 3$\times$ inference acceleration while obtaining competitive performance on HierText, and improves an average of 11.0% F-score on Total-Text, CTW1500, and ICDAR15.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. For these datasets, we introduce three corresponding sets of learnable task prompts in both the point decoder and hierarchical mask decoder to mitigate discrepancies…

WHY NOW

Scene Text Analysis moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

Segment

Scene Text Analysis

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "58ebca4d-8807-494a-97dd-fb6b0c359921", "arxiv_id": "2603.25168", "canonical_route": "/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "endpoints": { "paper_pack": "/api/v1/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis/paper-pack", "build_passport": "/api/v1/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis", "normalized_query": "2603.25168", "route": "/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "paper_ref": "et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis#webpage", "url": "https://sciencetostartup.com/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "name": "ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis", "description": "ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis#scholarlyArticle", "headline": "ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis", "description": "ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.", "url": "https://sciencetostartup.com/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis", "sameAs": "https://arxiv.org/abs/2603.25168", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.25168" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-26T08:37:32.000Z", "author": [ { "@type": "Person", "name": "Xike Zhang" }, { "@type": "Person", "name": "Maoyuan Ye" }, { "@type": "Person", "name": "Juhua Liu" }, { "@type": "Person", "name": "Bo Du" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Scene Text Analysis" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Scene Text Analysis", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "ET-SAM: Efficient Point Prompt Prediction in SAM for Unified", "item": "https://sciencetostartup.com/paper/et-sam-efficient-point-prompt-prediction-in-sam-for-unified-scene-text-detection-and-layout-analysis" } ] } ] }

Competitive landscape

ET-SAM accelerates scene text detection and layout analysis by efficiently predicting point prompts, enabling faster inference and better data utilization.

Segment

Scene Text Analysis

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline