I have a twitter body that I use to create a mood analysis app. The body has 5k tweets that have been marked negative, neutral, or positive
To represent the text - I use gensim word2vec preliminary vectors. Each word is displayed in 300 dimensions. For a tweet, I add all the word vectors to get one 300-fold vectors. Thus, each tweet is mapped to a single vector of size 300.
I view my data using t-SNE (tsne python package). See Attached Image 1 - Red dots = negative tweets, blue dots = neutral tweets and green dots = positive tweets
: () . , 300 ?
i.e t-SNE, ?
NO. , , , . , , - (, 3d-) .
, , . :
PCA, 300 , , 10. 300 ( ) 10 ( 10 , ) sum(top-10-eigenvalues)/sum(300-eigenvalues). " ", .
sum(top-10-eigenvalues)/sum(300-eigenvalues)
Source: https://habr.com/ru/post/1625318/More articles:Jackson PrettyPrint for Spring 4 - javaHow to test apartments, mini-apartments, Capybara & Selenium - ruby-on-railsDoes Xamarin.Forms support periodic background tasks? - backgroundworkerHow to swim and work in IL - c #Convert Boolean to Varchar2 - plsqlDoes an exception in PHP destroy the stack trace again? - phpEnd blender with exit code "1" when starting from the command line - pythonMutexes and lambda functions in C ++ - c ++How to update asp net web forms or mvc application without session loss? - c #Консоль CI Pipeline дает ошибку: - Получить https://registry-1.docker.io/v2: net/http: запрос отменен во время ожидания соединения - dockerAll Articles