Splitting a string into / if not inside []

I am trying to split a string representing XPath, for example:

string myPath = "/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4"; 

I need to divide by "/" ("/" is excluded from the results, as with normal line separation), if "/" is not in "[...]" (where "/" will not be separated, and also included in result).

So, that normal string[] result = myPath.Split("/".ToCharArray()) gets me:

 result[0]: //Empty string, this is ok result[1]: myns:Node1 result[2]: myns:Node2[. result[3]: myns:Node3=123456] result[4]: myns:Node4 

results[2] and result[3] should be combined, and I have to finish:

 result[0]: //Empty string, this is ok result[1]: myns:Node1 result[2]: myns:Node2[./myns:Node3=123456] result[3]: myns:Node4 

Since I'm not sure about the regex, I tried to manually recombine the results into a new array after splitting, but it worries me that although it is trivial to make it work for this example, the regex seems to be the best option when I get more complex xpaths .

For the record, I considered the following questions:
Regular Expression Separating String
C # Regex Split - commas outside quotes
Separate a line that has spaces if they are not enclosed in "quotes" quot ;?

While they should be sufficient to help with my problem, I come across several problems / confusing aspects that prevent them from helping me.
In the first two links, as a newbie in regex, it's hard for me to interpret and learn them. They are looking for quotes that look the same between left and right pairs, so translating it into [and] is confusing to me, and trial and error doesn't teach me anything, rather, it just upsets me more. I can understand a pretty basic regex, but what these answers do is a bit more than what I understand now, even with the explanation in the first link. In the third link, I will not have access to LINQ, since the code will be used in the old version of .NET.

+5
source share
4 answers

XPath is a complex language, trying to break an XPath expression into slashes at ground level, not in many situations, examples:

 /myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4 string(/myns:Node1/myns:Node2) 

I propose a different approach to address more cases. Instead of trying to break, try matching each part between slashes using the Regex.Matches(String, String) method. The advantage of this method is that you can freely describe how these parts look:

 string pattern = @"(?xs) [^][/()]+ # all that isn't a slash or a bracket (?: # predicates (eventually nested) \[ (?: [^]['""] | (?<c>\[) | (?<-c>] ) | "" (?> [^""\\]* (?: \\. [^""\\]* )* ) "" # quoted parts | ' (?> [^'\\]* (?: \\. [^'\\]* )* ) ' )*? (?(c)(?!$)) # check if brackets are balanced ] | # same thing for round brackets \( (?: [^()'""] | (?<d>\() | (?<-d>\) ) | "" (?> [^""\\]* (?: \\. [^""\\]* )* ) "" | ' (?> [^'\\]* (?: \\. [^'\\]* )* ) ' )*? (?(d)(?!$)) \) )* | (?<![^/])(?![^/]) # empty string between slashes, at the start or end "; 

Note. To make sure that the string is fully parsed, you can add at the end of the template something like: |\z(?<=(.)) . This way you can check if a capture group exists to find out if you are at the end of the line. (But you can also use the match position, length and length of the string.)

demo

+5
source

If you need a Regex template of complexity, such as Casimir and Hippolytus, perhaps Regex is not the best option in this case. To add a non-Regex solution, here's what the process looks like when an XPath string is processed manually:

 public string[] Split(string input, char splitChar, char groupStart, char groupEnd) { List<string> splits = new List<string>(); int startIdx = 0; int groupNo = 0; for (int i = 0; i < input.Length; i++) { if (input[i] == splitChar && groupNo == 0) { splits.Add(input.Substring(startIdx, i - startIdx)); startIdx = i + 1; } else if (input[i] == groupStart) { groupNo++; } else if (input[i] == groupEnd) { groupNo = Math.Max(groupNo - 1, 0); } } splits.Add(input.Substring(startIdx, input.Length - startIdx)); return splits.Where(s => !string.IsNullOrEmpty(s)).ToArray(); } 

Personally, I find it much easier to understand and implement. To use it, you can do the following:

 var input = "/myns:Node1/myns:Node2[./myns:Node3=123456]/myns:Node4[text(‌​)='some[] brackets']"; var split = Split(input, '/', '[', ']'); 

As a result, you get the following:

 split[0] = "myns:Node1" split[1] = "myns:Node2[./myns:Node3=123456]" split[2] = "myns:Node4[text(‌​)='some[] brackets']" 
+2
source

The second link you posted is actually perfect for your needs. All he needs is some tweak to detect brackets instead of apostrophes:

 \/(?=(?:[^[]*\[[^\]]*])*[^]]*$) 

Basically, this is that it only includes slashes that are executed by the left square bracket, and then the rectangular bracket before the next slash. You can use it like this:

 string[] matches = Regex.Split(myPath, "\\/(?=(?:[^[]*\\[[^\\]]*])*[^]]*$)") 
+1
source
 \/(?![^\[]*\]) 

Try it. Check out the demo.

https://regex101.com/r/uLcWux/1

Use with @ or \\/(?![^\\[]*\\])

PS This is only for simple xpaths that do not have nested parenthesis or [] inside quotes

+1
source

Source: https://habr.com/ru/post/1260534/


All Articles