Capturing a regex group enclosed / enclosed by a special character

I am trying to find words that appear within the tilde sign ( ~).

 e.g. ~albert~ is a ~good~ boy.

I know this is possible using ~.+?~, and it already works for me. But there are special cases when I need to match a nested tilde clause.

 e.g. ~The ~spectacle~~ was ~broken~

In the above example, I need to record “Spectacle”, “Spectacle” and “Broken” separately. They will be translated either one word at a time or with an accompanying article (An, The, whatever). The reason is because on my system:

1) 'The spectacle' requires a separate translation on a specific cases.
2) 'Spectacle' also needs translation on specific cases.
3) IF a tranlsation exist for The spectacle, we will use that, ELSE 
   we will use 

Another example to explain this:

 ~The ~spectacle~~ was ~borken~, but that was not the same ~spectacle~ 
  that was given to ~me~.

In the above example, I will have a translation for:

 1) 'The spectacle' (because the translation case exists for 'The spectacle', otherwise I would've only translated spectacle on it own)
 2) 'broken'
 3) 'spectacle'
 4) me

, . , , "~. +?". , - . - ?

, , . , .

N.B. , , . ~ ~ ~~ ( !!!!!)

+4
2

- , , :

(~(?(?=.*?~~.*?~).*?~.*?~.*?~|[^~]+?~))

(~(?(?=.*?~[A-Za-z]*?~.*?~).*?~.*?~.*?~|[^~]+?~))

RegEx101

(~(?:.*?~.*?~){0,2}.*?~)
                 ^^ change to max depth

-

.*?~ , .

, , ? :

~This text could be nested ~ so could this~ and this~ this ~Also this~
|                          |              |_________|      |         |
|                          |_______________________________|         |
|____________________________________________________________________|

~This text could be nested ~ so could this~ and this~ this ~Also this~
|                          |              |         |      |_________|
|                          |______________|         |
|___________________________________________________|

,

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
|    |         ||_____|      |                            |         |
|    |         |_____________|                            |         |
|    |____________________________________________________|         |
|___________________________________________________________________|

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.
|    |_________||     |______|                            |_________|                   |__|
|_______________|

?

( @tbraun), , :

{This text can be {properly {nested}} without problems} because {the compiler {can {see {the}}} start and end points} easily. Or use a compiler:

. Java,

import java.util.List;

String[] chars = myString.split('');
int depth = 0;
int lastMath = 0;
List<String> results = new ArrayList<String>();

for (int i = 0; i < chars.length; i += 1) {
    if (chars[i] === '{') {
        depth += 1;
        if (depth === 1) {
            lastIndex = i;
        }
    }
    if (chars[i] === '}') {
        depth -= 1;
        if (depth === 0) {
            results.add(StringUtils.join(Arrays.copyOfRange(chars, lastIndex, i + 1), ''));
        }
        if (depth < 0) {
            // Balancing problem Handle an error
        }
    }
}

StringUtils

+2

- . {}

\{[^{]*?\} {:

{The {spectacle}} was {broken}

{spectacle}
{broken}

{The spectacle}
-1

Source: https://habr.com/ru/post/1589405/


All Articles