Regex returns a full string instead of a match

I am trying to re-specify the date from a text file. This content is:

Warehouse Manager Administrative command-line interface - version 7, release 1, level 1.4 (c) Copyright from the corporation and others 1990, 2015. All rights reserved.

Session installed with TSERVER server: Windows Server version 7, release 1, level 5.200 Server date and time: 11/22/2016 15:30:00 Last access: 11/22/2016 15:25:00

Server command ANS8000I.

I need to extract the date / time after the server date / time. I wrote this regular expression:

/([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})/ 

This works great in regex101. See Example https://regex101.com/r/MB7yB4/1 However, in Powershell it responds to different ones.

 $var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})" 

gives

Server date and time: 11/22/2016 16:30:00 Last access: 11/22/2016 15:37:19

and

 $var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})" 

gives nothing.

I'm not sure why the match is not the same.
Thanks for any help!

+5
source share
5 answers

The -match operator returns a boolean indicating whether a match was found or not. In addition, it sets the $matches variable to match data (all matches and values ​​of the capture group). You just need to access the entire match:

 if($var -match '[0-9]{1,2}/[0-9]{1,2}/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}') { $matches[0] } 

See Using -match and the $matches variable in PowerShell .

Please note that there is no need for escaping / synmbol in Powershell regular expressions, since this character is not special, and regular expression delimiters (those external /.../ , as in JS, PHP regexp) are not used when defining a regular expression in Powershell

+1
source

This is because you are matching multiple lines by pulling a line that matches to pull an individual match from a line, use the following:

 foreach ($line in $var) { if ($line -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})") {write-output $matches[0]}} 
+1
source

If you are dealing with long REs, it makes sense to use named capture groups. When splitting RE into several, the name remains unchanged. If the RE can span multiple lines, you should use (?smi) and to be able to map crlf to . , you need to get the content with the -raw option. I use \ d instead of [0-9] to save 3chars.

 $var = Get-Content File.txt -Raw if ($var -match "(?smi)Server date/time: (?<ServerDT>\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*access: (?<LastAc>\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2})") { "ServerDT : "+$matches.ServerDT "LastAccess: "+$matches.LastAc } 

Output

 ServerDT : 11/22/2016 15:30:00 LastAccess: 11/22/2016 15:25:00 
+1
source

In such cases, I still prefer to use the .NET regex regular expression class matching method directly — it's faster, more accurate, and more verbose. If you are sure that the first date is a search result, you can use:

 [regex]::Matches($var,'\d{1,2}/\d{1,2}/\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}')[0].value 

I personally put “Server Date / Time:” in the regular expression and then remove it from the result (and parse the cleared result to a DateTime object, if necessary).

 ([regex]::Matches($a,'Server\sdate/time:\s\d{1,2}/\d{1,2}/\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}').value) -replace "Server date/time: ",'' 

PS. One quick tip avoids using var as a variable name even for tests. Really a bad habit.

0
source

To complement Wiktor Stribiżew’s useful answer , which contains many useful pointers and an effective solution, but does not explain the behavior of the -match operator using the array correctly:

  • The behavior of the -match operator changes if the LHS is an array of strings: array matching elements are returned instead of a logical element. Effectively -match then filters the array.
    • You probably read the contents of your file in $var only with Get-Content , which returns strings as a string array, not a single string. In PSv3 +, adding the -Raw switch reads the entire file as a single line.
    • Your regular expression matches (only) the 5th element of the input array (5th line from the file), so the element is returned - the entire line.
  • As explained in Wiktor's answer, you need to access the entries of the automatically generated $Matches hash table in order to access the information that was used the last time using -match : $Matches[0] contains what the regular expression captured as integer, $Matches[1] what the first (unnamed) capture group captured ( $Matches[2] for the second, ...) and $Matches['<name>'] for the named capture groups, as shown in Useful answer LotPing . ( $Matches.0 is just an alternative syntax for $Matches[0] , for example).
  • It is better to use single quotation marks ( '...' ) for defining regular expressions, so PowerShell's own string interpolation applied to double-quoted strings ( "..." ) does not interfere.

When it comes to extracting a substring using a regular expression, using -replace often allows a more succinct solution:

 $var -join "`n" -replace '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*', '$1' 

The extra step -join "`n" needed to assemble an array of strings in $var into one string in order to pass the input to -replace .
The explanation below shows how to use Get-Content -Raw to read the entire file as a single line to start with.

Explanation:

 # Read the text file as a *single* string, using -Raw. # Note: Without -Raw, you get an *array* of strings representing # the individual lines. $var = Get-Content -Raw file.txt # Define the regex that matches the *entire* input, # with a single capture group capturing the substring of interest. # The regex: # - is prefixed with an inline-option expression, (?s), which ensures # that . also matches a newline. # - starts with .*? a non-greedy expression matching any # sequence of characters at the start of the input, # - followed by the original capture-group regex (though without escaping of / as \/, # because that is not necessary in PowerShell, and \d used instead of [0-9]) # - ends with .*, a greedy expression that matches everything through the # end of the input. $re = '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*' # Using -replace, we replace the entire input string - by virtue # of the overall regex matching the entire string - with only # what the capture group captured ($1). # The net effect is that only the capture group value is output. # With the sample input, this outputs '1/22/2016 15:30:00', the first # timestamp encountered. $var -replace $re, '$1' 
0
source

Source: https://habr.com/ru/post/1260205/


All Articles