Implementation of the Naive Bayes algorithm in MATLAB - you need to be guided

I have a binary classification problem that I need to do in MATLAB. There are two classes, and the training and testing data are from two classes, and they are 2d coordinates derived from Gaussian distributions.

The samples are 2D points and they are something like this (1000 samples for classes A and 1000 samples for class B): I just post some of them here:

5.867766 3.843014 5.019520 2.874257 1.787476 4.483156 4.494783 3.551501 1.212243 5.949315 2.216728 4.126151 2.864502 3.139245 1.532942 6.669650 6.569531 5.032038 2.552391 5.753817 2.610070 4.25239354.44309

If new test data appears, how should the test sample be classified?

P (class / TestPoint) is proportional to P (TestPoint / Class) * (ProbabilityOfClass).

I'm not sure how we calculate the P (Sample / Class) variable for the 2D coordinate data. Right now i'm using the formula

P (coordinates / class) = (coordinates - average for this class) / standard deviation of points in this class).

However, I do not get very good test results. Am I doing something wrong?

+4
source share
2 answers

This is a good method, but the formula is incorrect, look at the article on a multi-dimensional Gaussian article on distribution on wikipedia :

P (TestPoint | Class) = enter image description here ,

Where enter image description here - determinant A.

Sigma = classPoint*classPoint'; mu = mean(classPoint,2); proba = 1/((2*pi)^(2/2)*det(Sigma)^(1/2))*... exp(-1/2*(testPoint-mu)*inv(Sigma)*(testPoint-mu)'); 

In your case, since there are so many points in both classes that P (class) = 1/2

+3
source

Assuming your formula is being applied correctly, another problem might be the derivation of functions from your data points. Perhaps your problem is not suitable for a linear classifier.

0
source

Source: https://habr.com/ru/post/1383242/


All Articles