Strange functionality in .NET Directory.GetFiles () when the search pattern contains 3 characters to extend

Recently, I came across unusual functionality from Microsoft:

Suppose our c:\tmp123 contains 3 files -
1.txt
2.txtx
3.txtxt

a) A call to Directory.GetFiles(@"C:\tmp123", "*.txt") gives in 3 returned items.
b) A call to Directory.GetFiles(@"C:\tmp123", "*.txtx") gives in 1 returned items.

According to Microsoft, this is the expected behavior (see note on MSDN ).

My questions:

  • Why did Microsoft decide to have such weird functionality?

  • How can I solve this problem?
    for example, how do I have a search template that will only return the *.txt extension and not return *.txtx , *.txtstarngefunctionality , etc.?

+6
source share
4 answers

The reason for this is backward compatibility.

Windows was originally built as a graphical interface on top of MSDOS, which only had files with 8 characters for the name and no more than 3 for the extension. MSDOS file system extensions allowed Windows to have longer file names and extensions, but they would still display as 8.3 file names in MSDOS.

Since the command line in Windows is an evolution of the old command interpreter in MSDOS, this means that some “anachronistic” behaviors (for example, a three-letter search pattern) were saved, so applications and scripts created in the “old days” or “old "old timers" will not break.

(another example is the fact that most Windows file systems are not case sensitive, yes, you guessed it, because MSDOS did not have a casing)

+2
source

If you want a workaround, you can just get all the file paths

 var files = Directory.GetFiles(@"C:\tmp123"); 

and then filter them as needed

 var txtFiles = files.Where(f => f.EndsWith(".txt")); var txtxFiles = files.Where(f => f.EndsWith(".txtx")); 
+1
source

I would agree to have it backward compatible. I don't see this exact question being mentioned, but this Raymond Chen blog post mentions a number of oddities in this area:

[...] some quirks of the FCB matching algorithm persist in Win32 because they have become an idiom.

For example, if your template ends with .* ,. .* Is ignored. Without this rule, the *.* Pattern will only match files containing a point that could break, probably 90% of all batch files on the planet, as well as all muscle memory, since all running Windows NT 3.1 have grown in the world, where *.* meant all files.

As another example, a pattern that ends with a dot is not really a match file that ends with a dot; it corresponds to files without extension. And the question mark can correspond to null characters if it comes immediately before the point.

0
source

Here is another workaround that will help with filtering files with extensions such as ".txtxt":

 var Files = System.IO.Directory.GetFiles("*.txt").Where(item => item.Extension.ToString().ToLower() == ".txt"); 
0
source

Source: https://habr.com/ru/post/905597/


All Articles