How to really split a string into arrays of strings without losing its part in C #?

What i have

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif"; string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com"; 

What I want

 string[] s; s[0] = "http://www.dsa.com/asd/jpg/good.jpg"; s[1] = "This is a good day"; s[2] = "http://www.a.com/b.png"; s[3] = "We are the Best friendshttp://www.c.com"; 

Bouns:
if the url can be broken as below, it will be better, but if not, it is ok.

 s[3] = "We are the Best friends"; s[4] = "http://www.c.com"; 

What question
I am trying to use the following code to split a string,

 string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); 

But the result is not good, it seems that the Split method takes out all the lines that match ImageRegPattern. But I want them to stay. I am checking the RegEx page on MSDN, it seems that there is no suitable method to meet my needs. So how to do this?

+4
source share
4 answers

You need something like this method, which first finds all matches and then collects them into a list along with unsurpassed strings between them.

UPDATE: Added a conditional to handle if no matches are found.

 private static IEnumerable<string> InclusiveSplit ( string source, string pattern ) { List<string> parts = new List<string>(); int currIndex = 0; // First, find all the matches. These are your separators. MatchCollection matches = Regex.Matches(source, pattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); // If there are no matches, there nothing to split, so just return a // collection with just the source string in it. if (matches.Count < 1) { parts.Add(source); } else { foreach (Match match in matches) { // If the match begins after our current index, we need to add the // portion of the source string between the last match and the // current match. if (match.Index > currIndex) { parts.Add(source.Substring(currIndex, match.Index - currIndex)); } // Add the matched value, of course, to make the split inclusive. parts.Add(match.Value); // Update the current index so we know if the next match has an // unmatched substring before it. currIndex = match.Index + match.Length; } // Finally, check is there is a bit of unmatched string at the end of the // source string. if (currIndex < source.Length) parts.Add(source.Substring(currIndex)); } return parts; } 

The result for entering your example would be:

 [0] "http://www.dsa.com/asd/jpg/good.jpg" [1] "This is a good day" [2] "http://www.a.com/b.png" [3] "We are the Best friendshttp://www.c.com" 
+4
source

You can’t just underestimate the power of :

(.*?)([AZ][\w\s]+(?=http|$))

Explanation:

  • (.*?) : group and match everything until an uppercase letter is found, in this group you will find the URL
  • ( : start group
    • [AZ] : matches one uppercase letter
    • [\w\s]+ : matches any character az, AZ, 0-9, _, \ n, \ r, \ t, \ f "" 1 or more times
    • (?=http|$) : lookahead, check the following: http or end of line
    • ) : close the group (here you will find the text)

online demo

Note. . This solution is intended to match a string, not split it.

+1
source

It seems to me that you need a multi-step process to insert a separator, which can then be used by the String.Split command:

 resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase); if (a.StartsWith("|") a = a.Substring(1); string a = resultString.Split('|'); 
0
source

The obvious answer here is, of course, not using split, but rather combining image templates and getting them. In this case, you can not use a split.

 string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))" 

This will match any point on the line followed by the image URL, or the point that precedes .jpg , .gif or .png .

I really do not recommend doing this, I just say that you can.

0
source

Source: https://habr.com/ru/post/1483424/


All Articles