Journal of Dis- crete Algorithms, Elsevier

Pivot selection: Di- mension reduction for distance-based indexing

作者:
R. Mao W. L. Miranker and D. P. Miranker.

关键词:
Similarity queryMetric spaceDimension reductionIntrinsic dimensionPivot selectionPivot space model

摘要:
Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking coordinate structure, mathematical tools in R n R n mathContainer Loading Mathjax can only be applied indirectly, making it difficult to theoretically study metric-space indexing. Toward solving this problem, a common algorithmic step is to select a small number of special points, called pivots 02 , and map the data objects to a low-dimensional space, one dimension for each pivot, where each dimension represents the distances of a pivot to the data objects. We formalize a “pivot space model” where all the data objects are used as pivots such that data is mapped from metric space to R n R n mathContainer Loading Mathjax , preserving all the pairwise distances under L ∞ L ∞ mathContainer Loading Mathjax . With this model, it can be shown that the indexing problem in metric space can be equivalently studied in R n R n mathContainer Loading Mathjax . Further, we show the necessity of dimension reduction for R n R n mathContainer Loading Mathjax and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of R n R n mathContainer Loading Mathjax makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric space.

在线下载

相关文章:
在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享