Homework 3 - Classifiers

Due: Friday February 28, 2014 at 11:55pm

  1. Describe how Bayesian Belief Networks differ from Naive Bayesian Classifiers. When would it be more beneficial to use a Bayesian Belief Network instead of a Naive Bayesian Classifier?
  2. Support Vector Machines (SVMs) are very accurate, particularly for linearly seperable data, but certain models can be quite slow to train. Briefly research the issue of scalable SVMs and describe some of the approaches researchers are using to overcome the slowness issue.
  3. (15 points) Write a k-nearest neighbor program that operates on an array of vectors. Each vector will be an entry consisting of n numeric attributes. You may choose the size of your database (a minimum of 50 vectors is required), the value for n (at least 5 attributes are required), and the value for k (k must be at least 5). Use Euclidean distance.

    Use the provided program knn_dataset.c to create a dataset with three different classes. Note that the file defaults to 10 attributes and 60 entries in the database (20 per each class). If you change the number of attributes (or the number of classes), you will need to alter the seed values in main() appropriately. A sample data file using the default values is knn_dataset.csv.

    To test your program, randomly generate a test vector with each attribute value between 0 and 100. Generate at least 5 test vectors and output the classification.

Upload your answers to the short answer questions as either a file or by answering in the Moodle Notes field. Upload your source code for the k-nearest neighbor classifier as a file, with an extension that clearly indicates the programming language you used (e.g. cpp for C++, pl for Perl, etc).