What is the meaning of "isolated symbolic probabilities of the English language",

In the note, I found this phrase:

Using isolated probabilities of English characters, you can find out the entropy of the language.

What does "isolated symbol probabilities" really mean? This is due to the entropy of the information source.

+2
source share
1 answer

It would be useful to find out where the note came from and what the context is, but even without it I’m quite sure that this simply means that they use the frequency of individual characters (for example, characters) as the basis for entropy, and than, for example, the joint probability ( character sequences) or conditional probability (one specific character to follow another).

So, if you have the alphabet X = {a, b, c, ..., z} and the probability P (a), P (b), ... for each character that will be displayed in the text (for example, on based on the frequency found in the data example), you calculated the entropy by calculating -P (x) * log (P (x)) for each character x individually, and then taking the sum of all. Then, obviously, you would use the probability of each symbol in isolation, rather than the probability of each symbol in context.

Note, however, that the term character in the found note does not necessarily refer to characters. This may apply to words or other units of text. However, the point they make is that they apply the classical formula for entropy to the "strong" probabilities of individual events (characters, words, whatever), not the probabilities of complex or conditional events.

+3
source

Source: https://habr.com/ru/post/910185/


All Articles