Differentiating a slash in a string using a regular expression

The program that I write (in Java) receives input consisting of three kinds of parts, separated by a slash / . Parts may be as follows:

  • The name matching the regular expression \w*
  • A call corresponding to the expression \w*\(.*\)
  • The path corresponding to the expression <.*>|\".*\" . A path may contain slashes.

An example line might look like this:

 bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo() 

which has the following structure

 name/call/call/path/name/path/call 

I want to break this string into pieces, and I'm trying to do this using a regex. My current expression captures slashes after calls and paths, but I am having trouble capturing slashes after names without including slashes that may exist within paths. My current expression, just capturing slashes after paths and calls, is as follows:

 (?<=[\)>\"])/ 

How can I extend this expression to also write slashes after names without including a slash in the path?

+6
source share
4 answers
 (\w+|\w+\([^/]*\)(?:/\w+\([^/]*\))*|<[^>]*>|"[^"]*")(?=/|$) 

grabs this from the string 'bar/foo()/foo(bar)/<foo/bar>/bar/"foo/bar"/foo()'

  • 'bar'
  • 'foo()/foo(bar)'
  • '<foo/bar>'
  • 'bar'
  • '"foo/bar"'
  • 'foo()'

It doesn't capture dividing slashes, though (why? - just assume that they are).

More simple (\w+|\w+\([^/]*\)|<[^>]*>|"[^"]*")(?=/|$) Will capture calls separately:

  • "foo()"
  • "foo(bar)"

EDIT: I usually do regex splitting:

  (# begin group 1 (for alternation)
   \ w + # at least one word character
 |  # or ...
   \ w + # at least one word character
   \ (# a literal "("
   [^ /] * # anything but a "/", as often as possible
   \) # a literal ")"
 |  # or ...
   <# a "<"
   [^>] * # anything but a ">", as often as possible
   > # a ">"
 |  # or ...
   "# a '"'
   [^ "] * # anything but a '"', as often as possible
   "# a '"'
 ) # end group 1
 (? = / | $) # look-ahead: ... followed by a slash or the end of string
+3
source

My first thought was to compare slashes with an even number of quotes to his left. (Ie, looking positively at something like (".*")* , But this ends with an exception:

 Look-behind group does not have an obvious maximum length 

Honestly, I think you will be better off with Matcher using a shared version of your components (e.g. \w*|\w*\(.*\)|(<.*>|\".*\") ) And do while (matcher.find()) .

+3
source

If your delimiter for your string is not escaped when used inside your input, it might not be the best choice. However, you have the luxury of a false slash inside a regular template. What I suggest ...

  • Split the entire line by "/"
  • Disassemble each part until you reach the start of the journey.
  • Put the elements of the path to the list until the end of the path
  • Go back to "/"

I highly recommend that you consider avoiding the "/" in your paths in order to make your life easier.

+3
source

This template captures all parts of your example string separately without including a separator in the results:

 \w+\(.*?\)|<.*>|\".*\"|\w+ 
+1
source

Source: https://habr.com/ru/post/889000/


All Articles