Skip to main content
Log in

Maximal Intersection Queries in Randomized Input Models

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  2. Broder, A.: On the resemblance and containment of documents. In: SEQUENCES ’97: Proceedings of the Compression and Complexity of Sequences 1997, Washington, DC, USA, p. 21. IEEE Comput. Soc., Los Alamitos (1997)

    Google Scholar 

  3. Bruck, J., Naor, M.: The hardness of decoding linear codes with preprocessing. IEEE Trans. Inf. Theory 36(2), 381–385 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  4. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  5. Clarkson, K.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)

    Google Scholar 

  6. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC ’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 91–100. Assoc. Comput. Mach., New York (2004)

    Chapter  Google Scholar 

  7. Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 301–312. Assoc. Comput. Mach., New York (2003)

    Chapter  Google Scholar 

  8. Goyal, N., Lifshits, Y., Schütze, H.: Disorder inequality: a combinatorial approach to nearest neighbor search. In: WSDM, pp. 25–32 (2008)

  9. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (survey article). ACM Trans. Database Syst. 28(4), 517–580 (2003)

    Article  Google Scholar 

  10. Hoffmann, B., Lifshits, Y., Nowotka, D.: Maximal intersection queries in randomized graph models. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 227–236. Springer, Berlin (2007)

    Google Scholar 

  11. Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn. CRC Press, Boca Raton (2004). Chap. 39

    Google Scholar 

  12. Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC ’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 741–750. Assoc. Comput. Mach., New York (2002)

    Chapter  Google Scholar 

  13. Kleinberg, J.M.: Two algorithms for nearest-neighbor search in high dimensions. In: STOC ’97: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 599–608. Assoc. Comput. Mach., New York (1997)

    Chapter  Google Scholar 

  14. Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximity search. In: SODA ’04: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 798–807. SIAM, Philadelphia (2004)

    Google Scholar 

  15. Lifshits, Y., Nowotka, D.: Estimation of the click volume by large scale regression analysis. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. Lecture Notes in Computer Science, vol. 4649, pp. 216–226. Springer, Berlin (2007)

    Google Scholar 

  16. Maaß, M.G., Nowak, J.: A new method for approximate indexing and dictionarylookup with one error. Inf. Process. Lett. 96(5), 185–191 (2005)

    Article  Google Scholar 

  17. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  18. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  19. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search—The Metric Space Approach, vol. 32. Springer, Berlin (2006)

    MATH  Google Scholar 

  20. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 6 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Hoffmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoffmann, B., Lifshits, M., Lifshits, Y. et al. Maximal Intersection Queries in Randomized Input Models. Theory Comput Syst 46, 104–119 (2010). https://doi.org/10.1007/s00224-008-9154-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-008-9154-6

Keywords

Navigation