How to create a word vector

How to create a word vector? I used one hotkey to create a dictionary vector, but it is very large and not generalized for a similar semantic word. Therefore, I heard about a vector vector that uses a neural network, which finds word similarities and word vectors. So I wanted to know how to generate this vector (algorithm) or good material to start creating a vector of words?

+6
source share
2 answers

Word vectors or the so-called distributed representations have a long history, possibly starting with the work of S. Benjio (Bengio, Y., Ducharme, R., and Vincent, P. (2001). Neural probabilistic language model. NIPS.), where he received vector words as a by-product of neural network training. lanuage model.

Many studies have shown that these vectors really capture the semantic relationship between words (see, for example, http://research.microsoft.com/pubs/206777/338_Paper.pdf ). In addition, this important article ( http://arxiv.org/abs/1103.0398 ) by Collober et al. Is a good starting point for understanding word vectors, how to obtain and use them,

In addition to word2vec, there are many ways to get them. Examples include embedding SENNA Collober et al. ( Http://ronan.collobert.com/senna/ ), embedding RNN T. Mikolova, which can be calculated using RNNToolkit ( http://www.fit.vutbr.cz / ~ imikolov / rnnlm / ) and much more. For English, ready-made attachments can be downloaded from these websites. word2vec really uses the skip-gram model (not a neural network model). Another quick code for computing word representations is GloVe ( http://www-nlp.stanford.edu/projects/glove/ ). This is an open question about which deep neural networks are needed to make good investments or not.

Depending on your application, you can use different types of vectors, therefore it is recommended to try several popular algorithms and see what is best for you.

+8
source

I think you mean Word2Vec ( https://code.google.com/p/word2vec/ ). He trains N-dimensional vectors of document vectors based on a given case. Thus, in my understanding of the word 2vec, a neural network is simply used to aggregate the size of a document vector, as well as to capture some connection between words. But it should be noted that this is not related to semantics, it simply reflects the structural relationships in your educational body.

If you want to catch semantic connectedness, look at measures based on WordNet, for example, these libraries are implemented:

To get started with word2vec, you can use their pre-processed vectors. You should find all the information about this at https://code.google.com/p/word2vec/ .

When you are looking for a Java implementation. This is a good starting point: http://deeplearning4j.org/word2vec.html

I hope this helps

Best wishes

+3
source

Source: https://habr.com/ru/post/979900/


All Articles