Premiado

Ensemble Classifiers with Bagging (MUST BE FINISHED IN 12 HOURS)

Implement the bagging algorithm provided on page 284 to construct 60 datasets from the training \r\nsets for both the car and liver datasets. Data points will be selected using a random uniform \r\nprobability with replacement. Data points must only be drawn from the original dataset - the \r\npoints in the test set are reserved for final evaluation of the classifier ensemble. Each \r\nconstructed dataset will be used to train on classifier for the final ensemble of 60 classifiers (see \r\nfigure 5.31, page 278). Once all 60 classifiers have been created, you may evaluate the overall \r\nclassification rate of the ensemble on the test sets. \r\n \r\nCreate the following ensembles and compare their relative performance: \r\n \r\n? 60 naïve Bayes classifiers using bins for continuous attributes (both datasets, binning only \r\napplies to the liver set obviously, use best bin count value found in part 1). \r\n \r\n? 60 naïve Bayes classifiers using probability density estimation (liver dataset only) \r\n \r\n? 60 k-nearest neighbors (fixed parameters) (both datasets) \r\n \r\n? 60 k-nearest neighbors (random k-value between 1 & 9)(both data sets) \r\n \r\n? 30 k-nearest neighbors (using best strategy from above), 30 Bayes classifiers (both datasets, \r\nuse probability density estimation for the liver dataset). \r\n \r\n \r\n Hints \r\n \r\n? Be sure to account for ties and include a tie-breaking scheme in your ensemble voting code. \r\n? 60 datasets will take up lots of memory; don\\\'t try to keep all of the datasets in memory at the \r\nsame time. Thus, you have (at least) two options: train your classifiers on the fly as each \r\ndataset is created, or save each intermediate dataset to a file and load it to/free it from \r\nmemory as necessary. \r\n? Make sure you can keep multiple classifiers in memory at the same time. The models \r\nshouldn\\\'t be too big, and you don\\\'t want to do lots of file I/O during the classification \r\nverification stage.

Implement the bagging algorithm provided on page 284 to construct 60 datasets from the training
sets for both the car and liver datasets. Data points will be selected using a random uniform
probability with replacement. Data points must only be drawn from the original dataset – the
points in the test set are reserved for final evaluation of the classifier ensemble. Each
constructed dataset will be used to train on classifier for the final ensemble of 60 classifiers (see
figure 5.31, page 278). Once all 60 classifiers have been created, you may evaluate the overall
classification rate of the ensemble on the test sets.

Create the following ensembles and compare their relative performance:

 60 naïve Bayes classifiers using bins for continuous attributes (both datasets, binning only
applies to the liver set obviously, use best bin count value found in part 1).

 60 naïve Bayes classifiers using probability density estimation (liver dataset only)

 60 k-nearest neighbors (fixed parameters) (both datasets)

 60 k-nearest neighbors (random k-value between 1 & 9)(both data sets)

 30 k-nearest neighbors (using best strategy from above), 30 Bayes classifiers (both datasets,
use probability density estimation for the liver dataset).


Hints

 Be sure to account for ties and include a tie-breaking scheme in your ensemble voting code.
 60 datasets will take up lots of memory; don’t try to keep all of the datasets in memory at the
same time. Thus, you have (at least) two options: train your classifiers on the fly as each
dataset is created, or save each intermediate dataset to a file and load it to/free it from
memory as necessary.
 Make sure you can keep multiple classifiers in memory at the same time. The models
shouldn’t be too big, and you don’t want to do lots of file I/O during the classification
verification stage.

Habilidades: Programação C++ , Matlab and Mathematica

Ver mais: bagging algorithm ensemble, the big o 1, r architecture, o 1 algorithm, car architecture, can 0 be a probability, big o time, algorithm big o, count hours, k-stage, fly with, bayes , train classifier, classifier train, probability data, voting random, must start hours, car data file, will finished, software construct, probability density, tie software, train software, car models software, php bayes

Acerca do Empregador:
( 0 comentários ) United States

ID do Projeto: #5083756

1 freelancer is bidding on average $250 for this job

RanaEnggServices

A proposal has not yet been provided

$250 USD in 12 dias
(3 Comentários)
3.5