How to really split a string into arrays of strings without losing its part in C #?

Question

How to really split a string into arrays of strings without losing its part in C #?

What i have

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif"; string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

What I want

 string[] s; s[0] = "http://www.dsa.com/asd/jpg/good.jpg"; s[1] = "This is a good day"; s[2] = "http://www.a.com/b.png"; s[3] = "We are the Best friendshttp://www.c.com";

Bouns:
if the url can be broken as below, it will be better, but if not, it is ok.

 s[3] = "We are the Best friends"; s[4] = "http://www.c.com";

What question
I am trying to use the following code to split a string,

 string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

But the result is not good, it seems that the Split method takes out all the lines that match ImageRegPattern. But I want them to stay. I am checking the RegEx page on MSDN, it seems that there is no suitable method to meet my needs. So how to do this?

+4

string arrays split c # regex

Albert gao May 29 '13 at 18:38

source share

4 answers

You can’t just underestimate the power of regex :

(.*?)([AZ][\w\s]+(?=http|$))

Explanation:

(.*?) : group and match everything until an uppercase letter is found, in this group you will find the URL
( : start group
- [AZ] : matches one uppercase letter
- [\w\s]+ : matches any character az, AZ, 0-9, _, \ n, \ r, \ t, \ f "" 1 or more times
- (?=http|$) : lookahead, check the following: http or end of line
- ) : close the group (here you will find the text)

online demo

_Note. _. _{This solution is intended to match a string, not split it.}

+1

Hamza May 29, '13 at 19:15

source share

It seems to me that you need a multi-step process to insert a separator, which can then be used by the String.Split command:

 resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase); if (a.StartsWith("|") a = a.Substring(1); string a = resultString.Split('|');

0

Dave michener May 29 '13 at 18:59

source share

The obvious answer here is, of course, not using split, but rather combining image templates and getting them. In this case, you can not use a split.

 string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

This will match any point on the line followed by the image URL, or the point that precedes .jpg , .gif or .png .

I really do not recommend doing this, I just say that you can.

0

melwil May 29 '13 at 18:59

source share

FishBasketGordo · Accepted Answer · 2013-05-29T19:01:06+0000

You need something like this method, which first finds all matches and then collects them into a list along with unsurpassed strings between them.

UPDATE: Added a conditional to handle if no matches are found.

 private static IEnumerable<string> InclusiveSplit ( string source, string pattern ) { List<string> parts = new List<string>(); int currIndex = 0; // First, find all the matches. These are your separators. MatchCollection matches = Regex.Matches(source, pattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); // If there are no matches, there nothing to split, so just return a // collection with just the source string in it. if (matches.Count < 1) { parts.Add(source); } else { foreach (Match match in matches) { // If the match begins after our current index, we need to add the // portion of the source string between the last match and the // current match. if (match.Index > currIndex) { parts.Add(source.Substring(currIndex, match.Index - currIndex)); } // Add the matched value, of course, to make the split inclusive. parts.Add(match.Value); // Update the current index so we know if the next match has an // unmatched substring before it. currIndex = match.Index + match.Length; } // Finally, check is there is a bit of unmatched string at the end of the // source string. if (currIndex < source.Length) parts.Add(source.Substring(currIndex)); } return parts; }

The result for entering your example would be:

 [0] "http://www.dsa.com/asd/jpg/good.jpg" [1] "This is a good day" [2] "http://www.a.com/b.png" [3] "We are the Best friendshttp://www.c.com"

How to really split a string into arrays of strings without losing its part in C #?

More articles: