The search for the number of occurrences of lines in a specific format occurs in the specified text

I have a large line where there may be specific words (text followed by one colon, for example "test:"), occurring more than once. For example, for example:

word: TEST: word: TEST: TEST: // random text 

the β€œword” occurs twice, and the β€œTEST” occurs three times, but the sum may be variable. In addition, these words should not be in the same order, and there may be more text in the same line as the word (as shown in the last example "TEST"). What I need to do is add an event number to each word, for example, the output line should be like this:

 word_ONE: TEST_ONE: word_TWO: TEST_TWO: TEST_THREE: // random text 

RegEx to get these words that I wrote, ^\b[A-Za-z0-9_]{4,}\b: However, I do not know how to quickly complete the above. Any ideas?

+3
source share
4 answers

Regex is great for this job - using Replace with a compliance evaluator:

This example has not been verified and not compiled:

 public class Fix { public static String Execute(string largeText) { return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator); } private Dictionary<String, int> counters = new Dictionary<String, int>(); private static String[] numbers = {"ONE", "TWO", "THREE",...}; public String Evaluator(Match m) { String word = m.Groups[1].Value; int count; if (!counters.TryGetValue(word, out count)) count = 0; count++; counters[word] = count; return word + "_" + numbers[count-1] + ":"; } } 

This should return what you requested when calling:

 result = Fix.Execute(largeText); 
+2
source

If you understand correctly, a regular expression is not needed here.

You can split your large string by the ':' character. You may also need to read line by line (split by '\n' ). After that, you simply create a dictionary ( IDictionary<string, int> ) that takes into account the occurrences of certain words. Each time you find the word x, you increment the counter in the dictionary.

EDIT

  • Read the file line by line OR divide the line by '\n'
  • Check if your separator is present. Either by splitting into ':' OR using a regular expression.
  • Get the first element from a split array OR the first match of your regular expression.
  • Use the dictionary to count your occurrences.

    if (dictionary.Contains(key)) dictionary[key]++;
    else dictionary.Add(key, 1);

  • If you need words instead of numbers, then create a different dictionary for them. So dictionary[key] is one if the key is 1 . Mabye has another solution for this.

+1
source

I think you can do it with Regax.Replace (string, string, MatchEvaluator) and a dictionary.

 Dictionary<string, int> wordCount=new Dictionary<string,int>(); string AppendIndex(Match m) { string matchedString = m.ToString(); if(wordCount.Contains(matchedString)) wordCount[matchedString]=wordCount[matchedString]+1; else wordCount.Add(matchedString, 1); return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2 } string inputText = "...."; string regexText = @""; static void Main() { string text = "...."; string result = Regex.Replace(text, @"^\b[A-Za-z0-9_]{4,}\b:", new MatchEvaluator(AppendIndex)); } 

see this: http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx

+1
source

Look at this example (I know that this is not ideal and not very nice) allows you to leave the exact argument to the Split function, I think it can help

 static void Main(string[] args) { string a = "word:word:test:-1+234=567:test:test:"; string[] tks = a.Split(':'); Regex re = new Regex(@"^\b[A-Za-z0-9_]{4,}\b"); var res = from x in tks where re.Matches(x).Count > 0 select x + DecodeNO(tks.Count(y=>y.Equals(x))); foreach (var item in res) { Console.WriteLine(item); } Console.ReadLine(); } private static string DecodeNO(int n) { switch (n) { case 1: return "_one"; case 2: return "_two"; case 3: return "_three"; } return ""; } 
0
source

Source: https://habr.com/ru/post/904953/


All Articles