Homework 4 - Clustering

Due: Monday March 17, 2014 at 11:55pm

  1. (10.3 in book): Use an example to show why the k-means algorithm may not find the global optimum, that is, optimizing the within-cluster variation.
  2. (10.6 in book): Both k-means and k-medroids algorithms can perform effective clustering.
    1. Illustrate the strength and weakness of k-means in comparison with k-medroids.
    2. Illustrate the strength and weakness of these schemes in comparison with a hierarchical clustering scheme.
  3. (10.12 in book): Present conditions under which density-based clustering is more suitable than partitioning-based clustering and hierarchical clustering. Give application examples to support your argument.
  4. (10.15 in book, modified): Data cubes and multidimensional databases contain nominal, ordinal, and numeric data in hierarchical or aggregate forms. Discuss how you could use one of the clustering methods in either Chapter 11 or Chapter 12 to find clusters in large data cubes, containing a variety of data types, effectively and efficiently.
  5. (11.1 in book): Traditional clustering methods are rigid in that they require each object to belong exclusively to only one cluster. Explain why this is a special case of fuzzy clustering. You may use k-means as an example.
  6. (10 pts) Using the dataset for the k-nearest-neighbor problem on the previous assignment, create a program to generate a k-nearest-neighbor graph (as used by Chameleon) and use one of the visualization tools to visualize the graph. Upload both an image file containing a sample visualization and your code as your answer to this problem.
Upload your answers to the short answer questions as either a file or by answering in the Moodle Notes field. Upload your source code for the coding question as a file, with an extension that clearly indicates the programming language you used (e.g. cpp for C++, pl for Perl, etc).