Find the highest current words in a C # line

I am trying to find the top occurrences of words in a string.

eg.

Hello World This is a great world, This World is simply great 

from the above line, I am trying to calculate the results, for example:

  • world 3
  • excellent, 2
  • hi 1
  • this, 2

but ignoring any words less than 3 characters long, for example. is , which occurred twice.

I tried to look into the Dictionary<key, value> pairs, I tried to learn the linq GroupBy . I know that the solution lies somewhere in the middle, but I just can not get around the algorithm and how to do it.

+6
source share
5 answers

Using LINQ and Regex

 Regex.Split("Hello World This is a great world, This World is simply great".ToLower(), @"\W+") .Where(s => s.Length > 3) .GroupBy(s => s) .OrderByDescending(g => g.Count()) 
+14
source
 const string input = "Hello World This is a great world, This World is simply great"; var words = input .Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) .Where(w => w.Length >= 3) .GroupBy(w => w) .OrderByDescending(g => g.Count()); foreach (var word in words) Console.WriteLine("{0}x {1}", g.Count(), word.Key); // 2x World // 2x This // 2x great // 1x Hello // 1x world, // 1x simply 

Not ideal, because it does not trim the comma, but shows how to group and filter at least.

+3
source

Therefore, I would avoid LINQ and Regex, etc., since it seems that you are trying to find an algorithm and understand that you do not use any function for this.

Not that these are not feasible solutions. They are. Definitely.

Try something like this

 Dictionary<string, int> dictionary = new Dictionary<string, int>(); string sInput = "Hello World, This is a great World. I love this great World"; sInput = sInput.Replace(",", ""); //Just cleaning up a bit sInput = sInput.Replace(".", ""); //Just cleaning up a bit string[] arr = sInput.Split(' '); //Create an array of words foreach (string word in arr) //let loop over the words { if (word.Length >= 3) //if it meets our criteria of at least 3 letters { if (dictionary.ContainsKey(word)) //if it in the dictionary dictionary[word] = dictionary[word] + 1; //Increment the count else dictionary[word] = 1; //put it in the dictionary with a count 1 } } foreach (KeyValuePair<string, int> pair in dictionary) //loop through the dictionary Response.Write(string.Format("Key: {0}, Pair: {1}<br />",pair.Key,pair.Value)); 
+3
source

I am writing a string processor class. You can use it.

Example:

 metaKeywords = bodyText.Process(blackListWords: prepositions).OrderByDescending().TakeTop().GetWords().AsString(); 

Grade:

  public static class StringProcessor { private static List<String> PrepositionList; public static string ToNormalString(this string strText) { if (String.IsNullOrEmpty(strText)) return String.Empty; char chNormalKaf = (char)1603; char chNormalYah = (char)1610; char chNonNormalKaf = (char)1705; char chNonNormalYah = (char)1740; string result = strText.Replace(chNonNormalKaf, chNormalKaf); result = result.Replace(chNonNormalYah, chNormalYah); return result; } public static List<KeyValuePair<String, Int32>> Process(this String bodyText, List<String> blackListWords = null, int minimumWordLength = 3, char splitor = ' ', bool perWordIsLowerCase = true) { string[] btArray = bodyText.ToNormalString().Split(splitor); long numberOfWords = btArray.LongLength; Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1); foreach (string word in btArray) { if (word != null) { string lowerWord = word; if (perWordIsLowerCase) lowerWord = word.ToLower(); var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "") .Replace("?", "").Replace("!", "").Replace(",", "") .Replace("<br>", "").Replace(":", "").Replace(";", "") .Replace("،", "").Replace("-", "").Replace("\n", "").Trim(); if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords))) { if (wordsDic.ContainsKey(normalWord)) { var cnt = wordsDic[normalWord]; wordsDic[normalWord] = ++cnt; } else { wordsDic.Add(normalWord, 1); } } } } List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList(); return keywords; } public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true) { List<KeyValuePair<String, Int32>> result = null; if (isBasedOnFrequency) result = list.OrderByDescending(q => q.Value).ToList(); else result = list.OrderByDescending(q => q.Key).ToList(); return result; } public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10) { List<KeyValuePair<String, Int32>> result = list.Take(n).ToList(); return result; } public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list) { List<String> result = new List<String>(); foreach (var item in list) { result.Add(item.Key); } return result; } public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list) { List<Int32> result = new List<Int32>(); foreach (var item in list) { result.Add(item.Value); } return result; } public static String AsString<T>(this List<T> list, string seprator = ", ") { String result = string.Empty; foreach (var item in list) { result += string.Format("{0}{1}", item, seprator); } return result; } private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords) { bool result = false; if (blackListWords == null) return false; foreach (var w in blackListWords) { if (w.ToNormalString().Equals(word)) { result = true; break; } } return result; } } 
+2
source
 string words = "Hello World This is a great world, This World is simply great".ToLower(); var results = words.Split(' ').Where(x => x.Length > 3) .GroupBy(x => x) .Select(x => new { Count = x.Count(), Word = x.Key }) .OrderByDescending(x => x.Count); foreach (var item in results) Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count)); Console.ReadLine(); 

To get the word with the most entries:

results.First().Word;

0
source

Source: https://habr.com/ru/post/904952/


All Articles