Maximal Intersection Queries in Randomized Input Models

Hoffmann, Benjamin; Lifshits, Mikhail; Lifshits, Yury; Nowotka, Dirk

doi:10.1007/s00224-008-9154-6

Maximal Intersection Queries in Randomized Input Models

Published: 30 October 2008

Volume 46, pages 104–119, (2010)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Benjamin Hoffmann¹,
Mikhail Lifshits²,
Yury Lifshits³ &
…
Dirk Nowotka¹

56 Accesses
4 Citations
Explore all metrics

Abstract

Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Article Google Scholar
Broder, A.: On the resemblance and containment of documents. In: SEQUENCES ’97: Proceedings of the Compression and Complexity of Sequences 1997, Washington, DC, USA, p. 21. IEEE Comput. Soc., Los Alamitos (1997)
Google Scholar
Bruck, J., Naor, M.: The hardness of decoding linear codes with preprocessing. IEEE Trans. Inf. Theory 36(2), 381–385 (1990)
Article MATH MathSciNet Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
Clarkson, K.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)
Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC ’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 91–100. Assoc. Comput. Mach., New York (2004)
Chapter Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 301–312. Assoc. Comput. Mach., New York (2003)
Chapter Google Scholar
Goyal, N., Lifshits, Y., Schütze, H.: Disorder inequality: a combinatorial approach to nearest neighbor search. In: WSDM, pp. 25–32 (2008)
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (survey article). ACM Trans. Database Syst. 28(4), 517–580 (2003)
Article Google Scholar
Hoffmann, B., Lifshits, Y., Nowotka, D.: Maximal intersection queries in randomized graph models. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 227–236. Springer, Berlin (2007)
Google Scholar
Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn. CRC Press, Boca Raton (2004). Chap. 39
Google Scholar
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC ’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 741–750. Assoc. Comput. Mach., New York (2002)
Chapter Google Scholar
Kleinberg, J.M.: Two algorithms for nearest-neighbor search in high dimensions. In: STOC ’97: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 599–608. Assoc. Comput. Mach., New York (1997)
Chapter Google Scholar
Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximity search. In: SODA ’04: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 798–807. SIAM, Philadelphia (2004)
Google Scholar
Lifshits, Y., Nowotka, D.: Estimation of the click volume by large scale regression analysis. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 216–226. Springer, Berlin (2007)
Google Scholar
Maaß, M.G., Nowak, J.: A new method for approximate indexing and dictionarylookup with one error. Inf. Process. Lett. 96(5), 185–191 (2005)
Article Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167 (2003)
Article MATH MathSciNet Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search—The Metric Space Approach, vol. 32. Springer, Berlin (2006)
MATH Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 6 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universität Stuttgart, Stuttgart, Germany
Benjamin Hoffmann & Dirk Nowotka
St. Petersburg State University, St. Petersburg, Russia
Mikhail Lifshits
California Institute of Technology, Pasadena, USA
Yury Lifshits

Authors

Benjamin Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Lifshits
View author publications
You can also search for this author in PubMed Google Scholar
Yury Lifshits
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Nowotka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Hoffmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoffmann, B., Lifshits, M., Lifshits, Y. et al. Maximal Intersection Queries in Randomized Input Models. Theory Comput Syst 46, 104–119 (2010). https://doi.org/10.1007/s00224-008-9154-6

Download citation

Received: 21 December 2007
Accepted: 15 October 2008
Published: 30 October 2008
Issue Date: January 2010
DOI: https://doi.org/10.1007/s00224-008-9154-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximal Intersection Queries in Randomized Input Models

Abstract

Access this article

Similar content being viewed by others

Classification of Maximum Hittings by Large Families

Union-Intersecting Set Systems

Selected Combinatorial Problems Through the Prism of Random Intersection Graphs Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximal Intersection Queries in Randomized Input Models

Abstract

Access this article

Similar content being viewed by others

Classification of Maximum Hittings by Large Families

Union-Intersecting Set Systems

Selected Combinatorial Problems Through the Prism of Random Intersection Graphs Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation