RegEx for the analysis of chemical formulas

I need a way to split a chemical formula into its components. The result should look like this:

   Ag3PO4 -> [Ag3, P, O4]
      H2O -> [H2, O]
   CH3OOH -> [C, H3, O, O, H]
Ca3(PO4)2 -> [Ca3, (PO4)2]

I don't know the regular expression syntax, but I know that I need something like this

[Optional brackets] [Uppercase letter] [0 or more lowercase letters] [0 or more digits] [Optional parentheses] [0 or more digits]

It worked

NSRegularExpression *regex = [NSRegularExpression
                              regularExpressionWithPattern:@"[A-Z][a-z]*\\d*|\\([^)]+\\)\\d*"
                              options:0
                              error:nil];
NSArray *tests = [[NSArray alloc ] initWithObjects:@"Ca3(PO4)2", @"HCl", @"CaCO3", @"ZnCl2", @"C7H6O2", @"BaSO4", nil];
for (NSString *testString in tests)
{
    NSLog(@"Testing: %@", testString);
    NSArray *myArray = [regex matchesInString:testString options:0 range:NSMakeRange(0, [testString length])] ;
    NSMutableArray *matches = [NSMutableArray arrayWithCapacity:[myArray count]];

    for (NSTextCheckingResult *match in myArray) {
        NSRange matchRange = [match rangeAtIndex:0];
        [matches addObject:[testString substringWithRange:matchRange]];
        NSLog(@"%@", [matches lastObject]);
    }
}
+6
source share
4 answers

(PO4)2 really sitting away from everyone.

Let's start with simple matching elements without parentheses:

[A-Z][a-z]?\d*

Using a regular expression above, we can successfully analyze Ag3PO4, H2O, CH3OOH.

- . ​​ :

\(.*?\)\d+

, or:

[A-Z][a-z]?\d*|\(.*?\)\d+

Regular expression visualization

. , .

. . Ex. Co3(Fe(CN)6)2

, :

[A-Z][a-z]?\d*|(?<!\([^)]*)\(.*\)\d+(?![^(]*\))

Regular expression visualization

Objective-C :

[A-Z][a-z]?\d*|\([^()]*(?:\(.*\))?[^()]*\)\d+

Regular expression visualization

( , , - A(B(CD)3E(FG)4)5 - .

[A-Z][a-z]?\d*|\((?:[^()]*(?:\(.*\))?[^()]*)+\)\d+

Regular expression visualization

+15

:

/(\(?)([A-Z])([a-z]*)([0-9]*)(\))?([0-9]*)/g

: http://refiddle.com/

+3

, , , ?

,

[A-Z][a-z]*\d*|\([^)]+\)\d*

\d shorcut [0-9], [^)] -, .

.

+3

RegEx
([A-Z][a-z]*\d*)|(\((?:[^()]+|(?R))*\)\d*) gm

+2

Source: https://habr.com/ru/post/1540172/


All Articles