Keyword Sort Algorithm

I have over 1000 polls, many of which contain open-ended answers.

I would like to be able to “parse” all the words and get a ranking of the most frequently used words (excluding common words) to determine the trend.

How can i do this? Is there a program that I can use?

EDIT If a third-party solution is not available, it would be great if we could only discuss Microsoft technologies. Greetings.

+3
source share
4 answers

Divide and win. Divide your problem into many smaller tasks and solve each of them.

: .

, , . , , "", , , , , , , "" , "". , , ( ), - , , .

, . . :

StreamReader streamReader = new StreamReader(@"c:\survey.txt");
string source = streamReader.ReadToEnd();

, -. . , , "" "" , . ? , , :

char[] punctuation = new char[] {' ', '\n', '\r', '\t', '(', ')', '"'};
string[] tokens = source.ToLower().Split(punctuation, true); 

. . , . , .. , .

ToLower? ToLowerInvariant? , ; . , ToLower canoncialize , . . - "-", - " ", , . ? .

: :

var firstPass = new Dictionary<string, int>();
foreach(string token in tokens)
{
    if (!firstPass.ContainsKey(token))
        firstPass[token] = 1;
    else
        ++firstPass[token];
} 

. , . , . , , - , . /, :

var groups = from pair in firstPass
             group pair.Key by pair.Value;

, , . . , , :

var sorted = from group in groups
             orderby group.Key
             select group;

, :

foreach(var g in sorted.Take(100))
{
  Console.WriteLine("Words with count {0}:", g.Key);
  foreach(var w in g)
    Console.WriteLine(w);
}

.

, , ? , . "" "" , . "" "" , . "" "" , , , - .

; , , .

, , - , . : , , .

+9

NLTK .

( NLTK) , . , , , .

UPDATE

Re: MS, NLTK .NET IronPython. . SO.

+4

SharpNLP - .NET NLP. , NLTK, , Google.

0

lucene -, . lucene index Luke, .

You can activate the output when indexing so that the words are grouped into a root form. This will help you combine different forms of the same word (plural, different tenses, etc.). These are "quetions, question, questioned", etc. Will be displayed as a “question”. This cannot be done with any other method.

0
source

Source: https://habr.com/ru/post/1746392/


All Articles