Regex to capture groups in traceroute using Java8

I am trying to analyze the traceroute results in Java8 using Regex.

I use the following regular expression to identify groups.

^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+ 

Some lines of lines that I need to parse are as follows:

 1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms 6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms * 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms 61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms 

And I want to extract the hop number (if available), ASN (if available), hostname, IP and time

but with the expression above, it matches lines 1,2 and 4 that I want, but only gives me hop, host and ASN.

My code looks like this:

 Pattern hop_pattern = Pattern.compile( "^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+") Matcher m = hop_pattern.matcher(target); while(m.find()) { System.out.println("count: " + m.groupCount()); for(int i = 1; i < m.groupCount() + 1; i++) { System.out.println(i + "->" + m.group(i)); } } 

Some lines of lines that I need to parse are as follows:

1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
* 4.68-7.2218 (4.68.72.218) 12.432 ms 11.819 ms
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms

And I want to extract the hop number (if available), ASN (if available), hostname, IP and time

but with the expression above, it matches lines 1,2 and 4 that I want, but only gives me hop, host and ASN.

My code looks like this:

  Pattern hop_pattern = Pattern.compile( "^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+") Matcher m = hop_pattern.matcher(target); while(m.find()) { System.out.println("count: " + m.groupCount()); for(int i = 1; i < m.groupCount() + 1; i++) { System.out.println(i + "->" + m.group(i)); } } 

I'm not sure if something is wrong with the code or with the regular expression itself. Thanks for the help!

Update: some examples and output

1 [AS0] 10.200.200.200 (10.200.200.200) 37.526 ms 35.793 ms 37.728 ms
Expected result: hop: 1 asn: 0 host name: 10.200.200.200 ip: 10.200.200.200 time: [37,526, 35,793, 37,728]

2 [AS0] scsc-usr-13500-02-eth1-07.xyz.com (10.96.15.3) 37.927 ms 36.122 ms *
Expected result: hop: 2 asn: 0 host name: scsc-usr-13500-02-eth1-07.xyz.com ip: 10.96.15.3 time: [37.927, 36.122]

I'm not sure if something is wrong with the code or with the regular expression itself. Thanks for the help!

+5
source share
2 answers

Answer

Part 1

To capture everything you are looking for, you need to use two separate regular expressions. The reason for this is a regular expression that only displays the last group found that matches the criteria, and you have several times in the traceroute results (e.g. 4.452 ms , 3.459 ms and 3.474 ms in your first line).

To understand which groups are captured, you can use the following regular expression (this is PCRE and will not work in Java, but it gives a clear idea of ​​which group will be captured).

This code can be seen here.

 ^(?P<hop>\d+)?[\h*]*(?:\[AS(?<ASN>\d*)\])?\h+(?<hostname>[\w\.]+)\h+\((?<ip>[\d+\.]+)\)\h+(?<times>.*?)\h*$ 

With a slight modification, the above regular expression can be used in Java (horizontal whitespace \h and named capture groups (?<name>...) not supported in Java regex as far as I know).

This code can be seen here.

 ^(\d+)?[\ \t*]*(?:\[AS(\d*)\])?[\ \t]+([\w\.]+)[\ \t]+\(([\d+\.]+)\)[\ \t]+(.*?)[\ \t]*$ 

Note Both global g modifiers and multi-line m modifiers are used.


Part 2

Run this second regular expression at the time that you captured in Part 1 to put together a list of all times.

This code can be seen here.

 ([\d.]+) 





results

Part 1

Enter

 1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms 6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms * 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms 61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms 

Output

Match 1

  • Full compliance 0-60 1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
  • Group 1. 1
  • Group 3. 10.33.128.1
  • Group 4. 10.33.128.1
  • Group 5. 4.452 ms 3.459 ms 3.474 ms

Match 2

  • Full match 61-124 6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
  • Group 1.6
  • Group 3356
  • Group 3. 4.68.72.218
  • Group 4. 4.68.72.218
  • Group 5. 12.432 ms 11.819 ms

Match 3

  • Full match 125-177 * 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
  • Group 3. 4.68.72.218
  • Group 4. 4.68.72.218
  • Group 5. 12.432 ms 11.819 ms

Match 4

  • Full match 178-232 61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
  • Group 61.182.180.62
  • Group 4. 61.182.180.62
  • Group 5. 175.300 ms 203.001 ms

Part 2

Enter

 4.452 ms 3.459 ms 3.474 ms 

Output

Match 1

  • Full compliance 0-5 4.452
  • Group 1. 4.452

Match 2

  • Full match 10-15 3.459
  • Group 1. 3.459

Match 3

  • Full match 20-25 3.474
  • Group 1. 3.474





edits

Credit to Casimir et Hippolyte for Java really allowing these capture groups, as well as other regex flavors.

Here's the regex updated since Java supports capture group names (?<name>...)

This regular expression can be seen here using here

 ^(?P<hop>\d+)?[\t *]*(?:\[AS(?<ASN>\d*)\])?[\t ]+(?<hostname>[\w\.]+)[\t ]+\((?<ip>[\d+\.]+)\)[\t ]+(?<times>.*?)[\t ]*$ 
+2
source

I had a very similar solution in preparation, but I tried to catch it all at once.

Now it works, as you can see here .

^(?P<hop>\d+)?[\W]*(?:\[AS(?<ASN>\d*)\])?[\t ]+(?<hostname>[\w\.]+)[\t ]+\((?<ip>[\d+\.]+)\)[\t ]+(?<times>(?:(?:[\t ]*(\d+\.\d+)\sms)\s*(?:(\d+\.\d+)\sms[\t ]*)(?:(\d+\.\d+)\sms[\t ]+)?))[\t ]*$

Update. Since \ h does not exist in Java, I replaced \ h with [\ t], waiting for a single instance, where I preferred \ W.
Appendix: As @Holger noted, \ h is available in Java 8.

However, most likely, it’s even easier to handle the time in an extra step, as shown in the excellent @ctwheels answer.

+1
source

Source: https://habr.com/ru/post/1271809/


All Articles