A regular expression is required to replace all characters surrounding only letters or numbers

I need a regular expression to replace all characters surrounding letters or numbers. With a space, I will use C # to run the expression, and I'm fine with the part just stuck in the regex part.

So, after replacing the following

  • Type-01 will be Type 01
  • 01) * it will still be 01) *
  • -Category: Toys will still be -Category: Toys
  • White: Back will be white black

Current expression

(?<=\w)[^a-zA-Z0-9Category:]+(?=\w) 

Input line

-Category: Toys AND (teddy bear type-01 *) OR (teddy bear white: black)

Required conclusion

-Category: Toys AND (Teddy bear type 01 *) OR (Teddy bear white black)

But I get

-Category: Toys and teddy bear type 01 or teddy bear White: black)

Not sure if I just missed something simple or just got the wrong end of the stick

+4
source share
2 answers

You cannot put words in a character class. All characters there will be added to this class, the order does not matter.

I'm not sure if this is enough for you, but for your example, this will work:

 (?<=\w)[^a-zA-Z0-9*:()\s]+(?=\w) 

and replace with one space.

I would add that this is even more Unicode style:

 (?<=\w)[^\p{L}0-9*:()\s]+(?=\w) 

Where \p{L} is the Unicode property for a letter in any language.

See here at Regexr

Update:

If you want to keep the colon, if there is a "Category" before you can do it like this

 (?<=\w)(?:[^a-zA-Z0-9*()\s:]+|(?<!Category):)(?=\w) 

Watch it at Regexr

I added a colon to the negative character class to say do not replace the colon. Then I added an alternative to say: replace the colon, but only if there was no category before.

+2
source

For C #, you can use the Regex.Replace function.

 string a = "Category:Toys AND (Teddy Bear Type-01*) OR (Teddy Bear White/Black)"; string s = string.Empty; s = Regex.Replace(a, @"[^()*:A-Za-z0-9]", " "); 
0
source

Source: https://habr.com/ru/post/1439777/


All Articles