Abstract
Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.
Similar content being viewed by others
References
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Broder, A.: On the resemblance and containment of documents. In: SEQUENCES ’97: Proceedings of the Compression and Complexity of Sequences 1997, Washington, DC, USA, p. 21. IEEE Comput. Soc., Los Alamitos (1997)
Bruck, J., Naor, M.: The hardness of decoding linear codes with preprocessing. IEEE Trans. Inf. Theory 36(2), 381–385 (1990)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Clarkson, K.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC ’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 91–100. Assoc. Comput. Mach., New York (2004)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 301–312. Assoc. Comput. Mach., New York (2003)
Goyal, N., Lifshits, Y., Schütze, H.: Disorder inequality: a combinatorial approach to nearest neighbor search. In: WSDM, pp. 25–32 (2008)
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (survey article). ACM Trans. Database Syst. 28(4), 517–580 (2003)
Hoffmann, B., Lifshits, Y., Nowotka, D.: Maximal intersection queries in randomized graph models. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 227–236. Springer, Berlin (2007)
Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn. CRC Press, Boca Raton (2004). Chap. 39
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC ’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 741–750. Assoc. Comput. Mach., New York (2002)
Kleinberg, J.M.: Two algorithms for nearest-neighbor search in high dimensions. In: STOC ’97: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 599–608. Assoc. Comput. Mach., New York (1997)
Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximity search. In: SODA ’04: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 798–807. SIAM, Philadelphia (2004)
Lifshits, Y., Nowotka, D.: Estimation of the click volume by large scale regression analysis. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 216–226. Springer, Berlin (2007)
Maaß, M.G., Nowak, J.: A new method for approximate indexing and dictionarylookup with one error. Inf. Process. Lett. 96(5), 185–191 (2005)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167 (2003)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search—The Metric Space Approach, vol. 32. Springer, Berlin (2006)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 6 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hoffmann, B., Lifshits, M., Lifshits, Y. et al. Maximal Intersection Queries in Randomized Input Models. Theory Comput Syst 46, 104–119 (2010). https://doi.org/10.1007/s00224-008-9154-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-008-9154-6