I am using Multinomial Naive Bayes Classifier from scratch to classify text in python.
I compute a function counter for each class and a probability distribution for functions.
According to my implementation, I get the following results:
Suppose I have the following body:
corpus = [
{'text': 'what is chat service?', 'category': 'what_is_chat_service'},
{'text': 'Why should I use your chat service?', 'category': 'why_use_chat_service'}
]
According to Naive Bays for this corpus, the previous probabilities for both classes will be 0.5
If I do some preprocessing, including lowercase, stop word deletion and punctuation, I get the following list of tokens:
- text 1: [chat, service]
- text 2: [use, chat, service]
, " -"
:
class chat service P(class|features)
what_is_chat_service 1 1 0.5
why_use_chat_service 1 1 0.5
2 . .
. , :
class what is chat service P(class|features)
what_is_chat_service 1 1 1 1 0.5(higher)
why_use_chat_service 1e-9 1e-9 1 1 5e-19
= 1e-9
i.e,
1: what_is_chat_service
-
:
corpus = [
{'text': 'what is chat service?', 'category': 'what_is_chat_service'},
{'text': 'what is the benefit of using chat service?', 'category': 'why_use_chat_service'}
]
1 .
" -?" .
"what_is_chat_service".
? Naive Bayes sklearn. .
, , .
.