Search for all substring positions in a larger string in C #

I have a large string that I need to parse and I need to find all extract"(me,i-have lots. of]punctuation and save the index of each in the list.

So, let's say that this fragment of the line was at the beginning and in the middle of the larger line, both of them were found, and their indices will be added to the List . and List will contain 0 , and another index - whatever it is.

I played, and string.IndexOf does almost what I am looking for, and I wrote the code - but it does not work, and I could not pinpoint what was wrong:

 List<int> inst = new List<int>(); int index = 0; while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39) { int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index); inst.Add(src); index = src + 40; } 
  • inst = List
  • source = Large String

Any better ideas?

+47
string c #
Apr 14 '10 at 21:52
source share
10 answers

Here is an example extension method for it:

 public static List<int> AllIndexesOf(this string str, string value) { if (String.IsNullOrEmpty(value)) throw new ArgumentException("the string to find may not be empty", "value"); List<int> indexes = new List<int>(); for (int index = 0;; index += value.Length) { index = str.IndexOf(value, index); if (index == -1) return indexes; indexes.Add(index); } } 

If you put this in a static class and import the namespace using using , it appears as a method for any string, and you can simply do:

 List<int> indexes = "fooStringfooBar".AllIndexesOf("foo"); 

For more information about extension methods http://msdn.microsoft.com/en-us/library/bb383977.aspx

Same thing using an iterator:

 public static IEnumerable<int> AllIndexesOf(this string str, string value) { if (String.IsNullOrEmpty(value)) throw new ArgumentException("the string to find may not be empty", "value"); for (int index = 0;; index += value.Length) { index = str.IndexOf(value, index); if (index == -1) break; yield return index; } } 
+89
Apr 14 2018-10-14T00:
source share

Why don't you use the built-in RegEx class:

 public static IEnumerable<int> GetAllIndexes(this string source, string matchString) { matchString = Regex.Escape(matchString); foreach (Match match in Regex.Matches(source, matchString)) { yield return match.Index; } } 

If you need to reuse the expression, then compile it and cache it somewhere. Change the matchString parameter to a Regex expression expression in a different overload for reuse.

+10
Apr 15 2018-10-15T00:
source share

using LINQ

 public static IEnumerable<int> IndexOfAll(this string sourceString, string subString) { return Regex.Matches(sourceString, subString).Cast<Match>().Select(m => m.Index); } 
+6
Apr 15 '10 at 2:38
source share

Polished version + case ignore support:

 public static int[] AllIndexesOf(string str, string substr, bool ignoreCase = false) { if (string.IsNullOrWhiteSpace(str) || string.IsNullOrWhiteSpace(substr)) { throw new ArgumentException("String or substring is not specified."); } var indexes = new List<int>(); int index = 0; while ((index = str.IndexOf(substr, index, ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal)) != -1) { indexes.Add(index++); } return indexes.ToArray(); } 
+4
Jan 13 '13 at 22:24
source share
 public List<int> GetPositions(string source, string searchString) { List<int> ret = new List<int>(); int len = searchString.Length; int start = -len; while (true) { start = source.IndexOf(searchString, start + len); if (start == -1) { break; } else { ret.Add(start); } } return ret; } 

Name it as follows:

 List<int> list = GetPositions("bob is a chowder head bob bob sldfjl", "bob"); // list will contain 0, 22, 26 
+1
Apr 14 2018-10-14T00:
source share

Hi, good answer from @Matti Virkkunen

 public static List<int> AllIndexesOf(this string str, string value) { if (String.IsNullOrEmpty(value)) throw new ArgumentException("the string to find may not be empty", "value"); List<int> indexes = new List<int>(); for (int index = 0;; index += value.Length) { index = str.IndexOf(value, index); if (index == -1) return indexes; indexes.Add(index); index--; } } 

But that covers tests like AOOAOOA where the substring

- AOOA and AOOA

Output 0 and 3

+1
Aug 26 '16 at 20:59 on
source share

Based on the code I used to search for multiple instances of a string in a larger string, your code will look like this:

 List<int> inst = new List<int>(); int index = 0; while (index >=0) { index = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index); inst.Add(index); index++; } 
0
Apr 14 2018-10-14T00:
source share

@csam is correct in theory, although its code will not match and may be interrupted

 public static IEnumerable<int> IndexOfAll(this string sourceString, string matchString) { matchString = Regex.Escape(matchString); return from Match match in Regex.Matches(sourceString, matchString) select match.Index; } 
0
Dec 13 '12 at 15:45
source share
 public static Dictionary<string, IEnumerable<int>> GetWordsPositions(this string input, string[] Susbtrings) { Dictionary<string, IEnumerable<int>> WordsPositions = new Dictionary<string, IEnumerable<int>>(); IEnumerable<int> IndexOfAll = null; foreach (string st in Susbtrings) { IndexOfAll = Regex.Matches(input, st).Cast<Match>().Select(m => m.Index); WordsPositions.Add(st, IndexOfAll); } return WordsPositions; } 
0
Jul 02 '16 at 16:04
source share

Without Regex, using a string comparison type:

 string search = "123aa456AA789bb9991AACAA"; string pattern = "AA"; Enumerable.Range(0, search.Length) .Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; }) .Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length),StringComparison.OrdinalIgnoreCase)) .Select(searchbit => searchbit.Index) 

This returns {3,8,19,22}. An empty template will match all positions.

For multiple templates:

 string search = "123aa456AA789bb9991AACAA"; string[] patterns = new string[] { "aa", "99" }; patterns.SelectMany(pattern => Enumerable.Range(0, search.Length) .Select(index => { return new { Index = index, Length = (index + pattern.Length) > search.Length ? search.Length - index : pattern.Length }; }) .Where(searchbit => searchbit.Length == pattern.Length && pattern.Equals(search.Substring(searchbit.Index, searchbit.Length), StringComparison.OrdinalIgnoreCase)) .Select(searchbit => searchbit.Index)) 

This returns {3, 8, 19, 22, 15, 16}

0
Jun 09 '17 at 15:43 on
source share



All Articles