The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Approximation algorithms for Hamming clustering problems

Author

Summary, in English

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.



We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1.

Department/s

  • Computer Science

Publishing year

2004

Language

English

Pages

289-301

Publication/Series

Journal of Discrete Algorithms

Volume

2

Issue

2 spec. iss.

Document type

Journal article

Publisher

Elsevier

Topic

  • Computer Science

Keywords

  • Theorem proving
  • Polynomials
  • Integer programming
  • Algorithms
  • Approximation theory
  • Mathematical models

Status

Published

Project

  • VR 2002-4049

ISBN/ISSN/Other

  • ISSN: 1570-8667