Skip to Main content Skip to Navigation
New interface
Conference papers

A Scalable MapReduce Similarity Join Computation Using LSH

Abstract : Similarity joins are recognized to be among the most useful data process- ing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold λ . In this paper, we introduce the MRS-join algorithm to perform similarity joins on large trajectories datasets. The MapReduce model and a randomized LSH (Local Sensitive Hashing) keys redistribution approach are used to balance load among pro- cessing nodes while reducing communications and computations to almost all rele- vant data by using distributed histograms. A cost analysis of the MRS-join algorithm shows that our approach is insensitive to data skew and guarantees perfect balancing properties, in large scale systems, during all stages of similarity join computations. These performances have been confirmed by a series of experiments using the Fréchet distance on large datasets of trajectories from real world and synthetic data benchmarks.
Complete list of metadata
Contributor : Sébastien Limet Connect in order to contact the contributor
Submitted on : Friday, January 14, 2022 - 4:23:24 PM
Last modification on : Friday, March 25, 2022 - 12:24:18 PM


  • HAL Id : hal-03526678, version 1


Sébastien Rivault, Mostafa Bamha, Sophie Robert, Sébastien Limet. A Scalable MapReduce Similarity Join Computation Using LSH. HLPP 2021: 14th International Symposium on High-Level Parallel Programming and Applications, 2021, Cluj-Napoca, Romania. ⟨hal-03526678⟩



Record views