Is it better to compare strings using toLowerCase or toUpperCase in JavaScript?

I am looking at a code review and I'm curious if it is better to convert strings to upper or lower case in JavaScript when trying to compare them while ignoring the case.

Trivial example:

var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase(); 

or should I do this:

 var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase(); 

It seems that either "should" or will work with a limited set of characters, as soon as the English letters, so that it will be more reliable than the other?

As a side note, MSDN recommends normalizing strings in uppercase, but this is for managed code (presumably C # and F #, but they have fancy StringComparers and core libraries): http://msdn.microsoft.com/en-us/ library / bb386042.aspx

+11
source share
2 answers

Revised Answer

Quite a lot of time passed when I answered this question. While cultural issues are still relevant (and I don't think they will ever disappear), the development of the ECMA-402 standard made my original answer ... obsolete (or obsolete?).

The best solution for comparing localized strings seems to use the localeCompare() function with the corresponding locales and options:

 var locale = 'en'; // that should be somehow detected and passed on to JS var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) { // do something when equal } 

This will allow you to compare two lines, not case sensitive, but subject to emphasis (for example, ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
ą != a).
If this is not sufficient for performance reasons, you may want to use either
toLocaleUpperCase (), or toLocaleLowerCase (), passing the locale as a parameter:

 if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) { // do something when equal } 

There should be no differences in theory. In practice, subtle implementation details (or lack of implementation in a given browser) can give different results ...

Original answer

I'm not sure what you really wanted to ask this question in the Internationalization (i18n) tag, but since you did ...
Probably the most unexpected answer: none .

There are many problems with case conversion, which inevitably leads to functional problems if you want to convert case letters without specifying a language (as is the case with JavaScript). For instance:

  1. There are many natural languages ​​that do not have the concept of upper- and lowercase letters. It makes no sense to try to convert them (although this will work).
  2. There are language-specific rules for converting strings. The German sharp S- character (ß) must be converted to two capital letters S (SS).
  3. Turkish and Azerbaijani (or Azerbaijani, if you want) has a "very strange" concept of two i characters : without the dot ı (which is converted to uppercase I) and the dot i (which is converted to uppercase İ <- this font does not allow proper representation, but this is really a different glyph).
  4. Greek has many "strange" rules of treatment. One specific rule concerns the capital letter Sigma (E), which, depending on the place in the word, has two lower case doubles: regular sigma (σ) and finite sigma (q). There are also other translation rules for "accented" characters, but they are usually omitted when implementing the conversion function.
  5. Some languages have uppercase letters , i.e. Lj that should be converted to things like LJ or less suitable LJ. The same can be said of ligatures .
  6. Finally, there are many compatibility symbols, which may mean the same as what you are trying to compare with, but consist of completely different symbols. Even worse, things like "ae" may be equivalent to "ä" in German and Finnish, but equivalent to "æ" in Danish.

I am trying to convince you that it is really better to compare user input literally than convert it. If it’s not user related, it probably doesn’t matter, but case conversion will always take time. Why bother?

+12
source

It never depends on the browser, as it is only the javascript that is involved. both will give performance based on the absence of characters that need to be changed (flip foot)

 var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase(); var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase(); 

If you use the test prepared by @adeneo, you may feel it is browser dependent, but make some other test inputs, such as "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA" and "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa and compare.

Javascript performance depends on the browser if there is any dom api or any manipulation / interaction with dom, otherwise it will give the same performance for all simple javascript.

0
source

Source: https://habr.com/ru/post/978058/


All Articles