The search for the number of occurrences of lines in a specific format occurs in the specified text

Question

The search for the number of occurrences of lines in a specific format occurs in the specified text

I have a large line where there may be specific words (text followed by one colon, for example "test:"), occurring more than once. For example, for example:

word: TEST: word: TEST: TEST: // random text

the “word” occurs twice, and the “TEST” occurs three times, but the sum may be variable. In addition, these words should not be in the same order, and there may be more text in the same line as the word (as shown in the last example "TEST"). What I need to do is add an event number to each word, for example, the output line should be like this:

 word_ONE: TEST_ONE: word_TWO: TEST_TWO: TEST_THREE: // random text

RegEx to get these words that I wrote, ^\b[A-Za-z0-9_]{4,}\b: However, I do not know how to quickly complete the above. Any ideas?

+3

string c # regex .net .net-2.0

rayanisran Dec 25 '11 at 15:29

source share

4 answers

If you understand correctly, a regular expression is not needed here.

You can split your large string by the ':' character. You may also need to read line by line (split by '\n' ). After that, you simply create a dictionary ( IDictionary<string, int> ) that takes into account the occurrences of certain words. Each time you find the word x, you increment the counter in the dictionary.

EDIT

Read the file line by line OR divide the line by '\n'
Check if your separator is present. Either by splitting into ':' OR using a regular expression.
Get the first element from a split array OR the first match of your regular expression.
Use the dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create a different dictionary for them. So dictionary[key] is one if the key is 1 . Mabye has another solution for this.

+1

Matthias Dec 25 '11 at 15:33

source share

I think you can do it with Regax.Replace (string, string, MatchEvaluator) and a dictionary.

 Dictionary<string, int> wordCount=new Dictionary<string,int>(); string AppendIndex(Match m) { string matchedString = m.ToString(); if(wordCount.Contains(matchedString)) wordCount[matchedString]=wordCount[matchedString]+1; else wordCount.Add(matchedString, 1); return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2 } string inputText = "...."; string regexText = @""; static void Main() { string text = "...."; string result = Regex.Replace(text, @"^\b[A-Za-z0-9_]{4,}\b:", new MatchEvaluator(AppendIndex)); }

see this: http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx

+1

Wint Dec 25 '11 at 16:17

source share

Look at this example (I know that this is not ideal and not very nice) allows you to leave the exact argument to the Split function, I think it can help

 static void Main(string[] args) { string a = "word:word:test:-1+234=567:test:test:"; string[] tks = a.Split(':'); Regex re = new Regex(@"^\b[A-Za-z0-9_]{4,}\b"); var res = from x in tks where re.Matches(x).Count > 0 select x + DecodeNO(tks.Count(y=>y.Equals(x))); foreach (var item in res) { Console.WriteLine(item); } Console.ReadLine(); } private static string DecodeNO(int n) { switch (n) { case 1: return "_one"; case 2: return "_two"; case 3: return "_three"; } return ""; }

0

Abcade Dec 25 '11 at 16:09

source share

Casperah · Accepted Answer · 2011-12-25T16:17:32+0000

Regex is great for this job - using Replace with a compliance evaluator:

This example has not been verified and not compiled:

 public class Fix { public static String Execute(string largeText) { return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator); } private Dictionary<String, int> counters = new Dictionary<String, int>(); private static String[] numbers = {"ONE", "TWO", "THREE",...}; public String Evaluator(Match m) { String word = m.Groups[1].Value; int count; if (!counters.TryGetValue(word, out count)) count = 0; count++; counters[word] = count; return word + "_" + numbers[count-1] + ":"; } }

This should return what you requested when calling:

 result = Fix.Execute(largeText);

The search for the number of occurrences of lines in a specific format occurs in the specified text

More articles: