What is a vector dimension of a word?

I am currently a fan of deep learning and reading about word2vector on this site https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-3-more-fun-with-word-vectors

For the CBOW or skipgram model, I see that the dimension of dictionary vectors is 300 and the size of the dictionary is 15000. What I read in a previous article is that we can encode words in vectors. Therefore, I assume that the vector dimension of a word should be equal to the size of the dictionary or ask a different question, what is the dimension of the word and how to visualize it. How do you take this dimension?

+3
source share
2 answers

"Word Vector Dimension" is a dimension of a vector that you trained with a tutorial. Technically, you can choose any measurement, for example, 10, 100, 300 and even 1000. The industrial norm is 300-500, because we experimented with various measurements (300, 400, 500, ... 1000, etc.), But they did not notice a significant performance improvement after 300-400. (It also depends on your training data.) As it sounds, a larger measurement means more complex calculations. However, if we set the measurement too low, then there is not enough vector space to collect the information contained in the entire training document.

How to visualize it?

You cannot easily visualize a 300-dimensional vector, and probably rendering 300-dimensional vectors is not very useful for you. What we can do is project these vectors into a 2-dimensional space, a space with which we are most familiar and which we can easily understand.

Your last statement So, I assume that the word size of the word should be equal to the size of the dictionary WRONG! Vocab is 171,476 words (total number of words in English)! Measuring a vector in a word (mostly 300-500. You don’t want to train the 1 billionth vectors, right?) Is the size of the vector that you decided in advance to learn the data. My video (shameless plugin) helps you understand the important concepts of the word vector: AI with the best

+5
source

In fact, the word dimension of a word does not reflect the size of the dictionary. What Word2Vec does is map words to their representation in a vector space, and you can make this space any dimension you want: Each word is represented by a point in this space, and the vector dimension of a word is the coordinates of that word in this space. Also, words that usually appear in the same context appear next to each other in this space.

Hope this helps

+1
source

Source: https://habr.com/ru/post/1270356/


All Articles