T-SNE high-level data visualization

I have a twitter body that I use to create a mood analysis app. The body has 5k tweets that have been marked negative, neutral, or positive

To represent the text - I use gensim word2vec preliminary vectors. Each word is displayed in 300 dimensions. For a tweet, I add all the word vectors to get one 300-fold vectors. Thus, each tweet is mapped to a single vector of size 300.

I view my data using t-SNE (tsne python package). See Attached Image 1 - Red dots = negative tweets, blue dots = neutral tweets and green dots = positive tweets

tweets submitted using word2vec

: () . , 300 ?

i.e t-SNE, ?

+4
1

: () . , 300 ?

NO. , , , . , , - (, 3d-) .

, , . :

PCA, 300 , , 10. 300 ( ) 10 ( 10 , ) sum(top-10-eigenvalues)/sum(300-eigenvalues). " ", .

+5

Source: https://habr.com/ru/post/1625318/


All Articles