Why does `-lt` behave differently for characters and strings?

I recently answered a SO question about using -lt or -gt with strings. My answer was based on what I read earlier , which said that -lt compares one char with each line at a time, until the ASCII value is equal to another. At this point, the result is decided (lower / equal / larger). By this logic, "Less" -lt "less" should return True , because L has a lower ASCII byte value than L , but it is not:

 [System.Text.Encoding]::ASCII.GetBytes("Less".ToCharArray()) 76 101 115 115 [System.Text.Encoding]::ASCII.GetBytes("less".ToCharArray()) 108 101 115 115 "Less" -lt "less" False 

It seems that I may have been missing the important part: the test is case insensitive

 #L has a lower ASCII-value than l. PS doesn't care. They're equal "Less" -le "less" True #The last s has a lower ASCII-value than t. PS cares. "Less" -lt "lest" True #T has a lower ASCII-value than t. PS doesn't care "LesT" -lt "lest" False #Again PS doesn't care. They're equal "LesT" -le "lest" True 

Then I tried to test char vs single-character-string:

 [int][char]"L" 76 [int][char]"l" 108 #Using string it case-insensitive. L = l "L" -lt "l" False "L" -le "l" True "L" -gt "l" False #Using chars it case-sensitive! L < l ([char]"L") -lt ([char]"l") True ([char]"L") -gt ([char]"l") False 

For comparison, I tried using a less case-sensitive operator, but it says L > l , which is the opposite of the -lt return for characters.

 "L" -clt "l" False "l" -clt "L" True 

How does the comparison work because it is clearly not using an ASCII value and why does it behave differently for characters or strings?

+5
source share
2 answers

Many thanks to PetSerAl for all his invaluable contributions.

TL dr

  • -lt and -gt compare instances of [char] numerically using Unicode code.

    • It is confusing, so -ilt , -clt , -igt , -cgt - although they only make sense with string operands, it’s a fad in PowerShell itself (see below).
  • -eq (and its alias -ieq ), on the contrary, compare -ieq [char] instances, which is usually, but not necessarily, case-insensitive string comparison ( -ceq again compared strictly numerically).

    • -eq / -ieq ultimately also compares numerically, but first converts the operands to their uppercase equivalents using an invariant culture; as a result, this comparison is not completely equivalent to the PowerShell syntax matching, which additionally recognizes the so-called compatible sequences (single characters or even sequences considered identical, see Unicode equivalence ) as equal.
    • In other words: special PowerShell examples are the behavior of only -eq / -ieq with [char] operands , and it does so in a way that is almost, but not completely different from case-insensitive string comparisons .
  • This difference leads to anti-intuitive behavior, for example, [char] 'A' -eq [char] 'a' and [char] 'A' -lt [char] 'a' , returning $true .

  • To be safe:

    • always send to [int] if you need a numeric (Unicode code notation).
    • always select [string] if you need string comparisons.

For initial information, read.


PowerShell's usually useful operator overloading can be tricky at times.

Note that in a numeric context (implicit or explicit), PowerShell processes characters ( [char] ( [System.Char] )) numerically strong>, their Unicode codepoint (not ASCII).

 [char] 'A' -eq 65 # $true, in the 'Basic Latin' Unicode range, which coincides with ASCII [char] 'Ā' -eq 256 # $true; 0x100, in the 'Latin-1 Supplement' Unicode range 

What makes [char] unusual is that its instances are compared numerically as is, according to Unicode code, EXCEPT with -eq / -ieq .

  • compare ceq , -lt and -gt directly by Unicode code points and - counter-intuitively - so -ilt , -clt , -igt and -cgt :
 [char] 'A' -lt [char] 'a' # $true; Unicode codepoint 65 ('A') is less than 97 ('a') 
  • -eq (and its alias -ieq ) first converts characters to uppercase and then compares the received Unicode code points:
 [char] 'A' -eq [char] 'a' # !! ALSO $true; equivalent of 65 -eq 65 

It’s worth considering this Buddhist twist: this and that: in the PowerShell world, the symbol “A” is smaller and equal to “a”, depending on how you compare .

In addition, directly or indirectly - after converting to uppercase - the Unicode code comparison does not match the string comparison , because PowerShell string comparison additionally recognizes the so-called compatible sequences, where characters (or even character sequences) are considered “the same” if they have same meaning (see Unicode equivalence ); eg:.

 # Distinct Unicode characters U+2126 (Ohm Sign) and U+03A9 Greek Capital Letter Omega) # ARE recognized as the "same thing" in a *string* comparison: "Ω" -ceq "Ω" # $true, despite having distinct Unicode codepoints # -eq/ieq: with [char], by only applying transformation to uppercase, the results # are still different codepoints, which - compared numerically - are NOT equal: [char] 'Ω' -eq [char] 'Ω' # $false: uppercased codepoints differ # -ceq always applies direct codepoint comparison. [char] 'Ω' -ceq [char] 'Ω' # $false: codepoints differ 

Please note that using the i or c prefixes to explicitly indicate case-sensitive behavior is NOT sufficient to force string comparisons , although conceptual operators such as -ceq , -ieq , -clt , -ilt , -cgt , -igt only make sense with in rows.

Effectively, the i and c prefixes are simply ignored when applied to -lt and -gt when comparing operands [char] ; as it turns out (unlike what I originally thought), this is a common PowerShell trap - see the explanation below.

Aside: the logic of -lt and -gt in comparison of strings is not numerical, but is based on sorting order (a human-oriented way of ordering regardless of code point / byte values) which in .NET terms is controlled by cultures (the one that acts by default at the moment, or passing the culture parameter to methods).
As @PetSerAl demonstrates in a comment (and unlike what I originally claimed), PS string comparison uses a culture of invariants rather than the current culture, so their behavior is the same, regardless of which culture is the current one.


Behind the scenes:

As @PetserAl explains in the comments, PowerShell parsing does not distinguish between the basic form of a statement and its i -prefixed form; for example, both -lt and -ilt translate to the same value, Ilt .
Thus, Powershell cannot implement different behavior for -lt vs. -ilt , -gt vs. igt , ... because it treats them the same way at the syntax level.

This leads to some "strong" contact-intuitive behavior in that operator prefixes are effectively ignored when comparing data types where case sensitivity is not relevant - as opposed to string coercion, as you might expect; eg:.

 "10" -cgt "2" # $false, because "2" comes after "1" in the collation order 10 -cgt 2 # !! $true; *numeric* comparison still happens; the `c` is ignored. 

In the latter case, I expected the use of -cgt to force operands to strings, given that case-sensitive comparisons are just a meaningful concept in string comparisons, but that’s NOT how it works.

If you want to delve deeper into PowerShell, see @PetSerAl's comments below.

+2
source

I'm not quite sure that the message here, except for comparisons, is correct when working with strings / characters. If you want to compare Ordinal, compare Ordinal and get results based on this.

Best Practices for Using Strings in the .NET Framework

 [string]::Compare('L','l') returns 1 

and

 [string]::Compare("L","l", [stringcomparison]::Ordinal) returns -32 

Not sure what to add here to help clarify.

Also see: Upper and lower case

+1
source

Source: https://habr.com/ru/post/1245356/


All Articles