In the classification problem of two classes, two sets of vector characteristics are provided, one for each class.
The sets are stored in text files [url removed, login to view] and c2.dat.
Each file consists of lines and each line has 20 values (characteristics)
So a vector of characteristics is stored in each row.
A) Implement the classification algorithm k-NN, (k-Nearest Neighbors). The algorithm steps are:
1. Calculate the Euclidean distances of the classification vector, x, with all the vectors in
the first file ([url removed, login to view]).
2. Calculate the Euclidean distances for the classification vector, x, with all the vectors in
the second file ([url removed, login to view]).
3. Sort these Euclidean distances in ascending order (using both classes/files).
4. Assume an odd integer k, ie, k = 3. After the classification, count the k smallest Euclidean distances.
How many come from the first and how many from the second class?
Comment: A vector is classified in the Class with the most impressions.
B) Out of the 20 characteristics of the problem, select 3 (your choice).
From each data set select randomly 60% of the vectors as training vectors and use the rest to find
the classification error, according to the algorithm k-NN (k = 1 and k = 3).
Repeat the above process of random choice of the 60% of the vectors 10 times and calculate the average classification error for k = 3.
C) Instead of classifier k-NN, construct and train a neural network of your choice with the same 3 characteristics
you selected in (B). Then proceed as in (B) and comment/compare the
performance of the classifiers.