ARXIV:2603.24384 · DATA MINING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

On the Use of Bagging for Local Intrinsic Dimensionality Estimation

Kristóf Péter · Ricardo J. G. B. Campello · James Bailey · Michael E. Houle · arXiv

A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error. Accurate LID estimation requires samples drawn from small neighborhoods around each query to avoid biases from…

METHOD

Full abstract

The theory of Local Intrinsic Dimensionality (LID) has become a valuable tool for characterizing local complexity within and across data manifolds, supporting a range of data mining and machine learning tasks. Accurate LID estimation requires samples drawn from small neighborhoods around each query to avoid biases from nonlocal effects and potential manifold mixing, yet limited data within such neighborhoods tends to cause high estimation variance. As a variance reduction strategy, we propose an ensemble approach that uses subbagging to preserve the local distribution of nearest neighbor (NN) distances. The main challenge is that the uniform reduction in total sample size within each subsample increases the proximity threshold for finding a fixed number k of NNs around the query. As a result, in the specific context of LID estimation, the sampling rate has an additional, complex interplay with the neighborhood size, where both combined determine the sample size as well as the locality and resolution considered for estimation. We analyze both theoretically and experimentally how the choice of the sampling rate and the k-NN size used for LID estimation, alongside the ensemble size, affects performance, enabling informed prior selection of these hyper-parameters depending on application-based preferences. Our results indicate that within broad and well-characterized regions of the hyper-parameters space, using a bagged estimator will most often significantly reduce variance as well as the mean squared error when compared to the corresponding non-bagged baseline, with controllable impact on bias. We additionally propose and evaluate different ways of combining bagging with neighborhood smoothing for substantial further improvements on LID estimation performance.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. As a result, in the specific context of LID estimation, the sampling rate has an additional, complex interplay with the neighborhood size, where both…

WHY NOW

Data Mining moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainA theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

Segment

Data Mining

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "99b461d0-725a-4087-a7f7-f551785d31af", "arxiv_id": "2603.24384", "canonical_route": "/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "endpoints": { "paper_pack": "/api/v1/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation/paper-pack", "build_passport": "/api/v1/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "On the Use of Bagging for Local Intrinsic Dimensionality Estimation", "normalized_query": "2603.24384", "route": "/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "paper_ref": "on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation#webpage", "url": "https://sciencetostartup.com/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "name": "On the Use of Bagging for Local Intrinsic Dimensionality Estimation", "description": "A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation#scholarlyArticle", "headline": "On the Use of Bagging for Local Intrinsic Dimensionality Estimation", "description": "A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.", "url": "https://sciencetostartup.com/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation", "sameAs": "https://arxiv.org/abs/2603.24384", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.24384" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-25T15:03:25.000Z", "author": [ { "@type": "Person", "name": "Kristóf Péter" }, { "@type": "Person", "name": "Ricardo J. G. B. Campello" }, { "@type": "Person", "name": "James Bailey" }, { "@type": "Person", "name": "Michael E. Houle" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Data Mining" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Data Mining", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "On the Use of Bagging for Local Intrinsic Dimensionality Est", "item": "https://sciencetostartup.com/paper/on-the-use-of-bagging-for-local-intrinsic-dimensionality-estimation" } ] } ] }

Competitive landscape

A theoretical framework for improving local intrinsic dimensionality estimation using bagging to reduce variance and mean squared error.

Segment

Data Mining

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

On the Use of Bagging for Local Intrinsic Dimensionality Estimation

On the Use of Bagging for Local Intrinsic Dimensionality Estimation

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline