What is the correct order of the previous vector in fitensemble?

Question

What is the correct order of the previous vector in fitensemble?

When using matlabs fitensemble to learn the classifier, I can specify the prior parameter, as well as the classnames parameter.

Is the order of the elements in both vectors the same? And what is the standard value for true / false classes?

To be more specific: suppose that a true class has a previous probability of 0.6, a false class of 0.4; Should I use:

ens = fitensemble(...,'prior',[0.6 0.4]) or

ens = fitensemble(...,'prior',[0.4 0.6]) or

ens = fitensemble(...,'prior',[0.4 0.6],'classnames',[true false]) or

ens = fitensemble(...,'prior',[0.4 0.6],'classnames',[false,true]) ?

I can not find the answer in the documentation .

The perfcurve documentation is more specific:

Priority : any string or array with two elements. It represents the probabilities for the positive and negative classes, respectively. The default value is "empirical", i.e. Perfcurve gets previous probabilities from class frequencies. If set to uniform, perfcurve sets all previous probabilities to equal.

+4

matlab random-forest

jan-glx Aug 16 '13 at 9:56

source share

1 answer

Rashid · Accepted Answer · 2014-11-01T11:12:33+0000

ens = fitensemble (X, Y, method, nlearn, students) creates an ensemble model that predicts responses to data. The ensemble consists of models listed in the students.

First part

You must use prior alphabetically your class labels.

So, if the labels are ['A','B'] , you use 'prior',[P(A) P(B)] ,

or if the labels are ['true','false'] , you use 'prior',[P(false) P(true)] ,

or if the labels are [-1 10] , you use 'prior',[P(-1) P(10)] .

The second part

In classnames this parameter is used so that you can call fitensemble for fewer classes in your data.

Imagine you have four classes A,B,C,D , so your Y will look something like this:

 Y = [A;A;B;D;B;A;C;A;A;A;D, ... ];

Now you can write 'classnames',['A';'B'], if you want fitensemble for only two classes, and it will be the same as 'classnames',['B';'A'], fitensemble

I know this is a late answer, I hope this helps.

Example

I used the database "fisheriris", which has three classes ( setosa', versicolor , virginica`).

since it has 150 cases and 50 for each class, I randomized the data and selected 100 samples.

 load fisheriris rng(12); idx = randperm(size(meas,1)); meas = meas(idx,:); species = species(idx,:); meas = meas(1 : 100,:); species = species(1 : 100,:); trueprior = [ sum(strcmp(species,'setosa')),... sum(strcmp(species,'versicolor')),... sum(strcmp(species,'virginica'))] / 100;

trueprior = [0.32,0.30,0.38] shows true previous probabilities.

In the following code, I prepared three fitensembles , the first with default parameters, so the previous probability is empirical (the same as trueprior ); The second is training with pprior set to trueprior , which will have the same results as the fist (because trueprior is in alphabetical order of class labels). The third one trains with a non-alphabetical order and shows different results than the first two.

 ada1 = fitensemble(meas,species,'AdaBoostM2',20,'tree'); subplot(311) plot(resubLoss(ada1,'mode','individual')); title('Resubstitution error for default prior (empirical)'); ada2 = fitensemble(meas,species,'AdaBoostM2',20,'tree','prior',trueprior); subplot(312) plot(resubLoss(ada2,'mode','individual')); title('Resubstitution error for prior with alphabetical order of class labels'); ada3 = fitensemble(meas,species,'AdaBoostM2',20,'tree','prior',trueprior(end:-1:1)); subplot(313) plot(resubLoss(ada3,'mode','individual')); title('Resubstitution error for prior with random order');

I also trained fitensemble only two classes using the classnames option

 ada4 = fitensemble(meas,species,'AdaBoostM1',20,'tree','classnames',... {'versicolor','virginica'});

As evidence, AdaBoosM1 , which does not support more than two classes, works great here with only two classes.

What is the correct order of the previous vector in fitensemble?

More articles: