Convert TXT file of unknown encoding to string

How can I convert Plain Text (.txt) files to a string if the encoding type is unknown?

I am working on a feature that will allow users to import txt files into my application. This means that the file could be created in any number of applications using any of the many encodings that are considered valid for a regular text file. I understand that this may include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE or EBCDIC ?!)

Everything went well using the following:

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&errorReading];

The user then provided a file that resulted in the import of empty content. I looked at the file in Xcode debugging and saw a Cocoa 261 error, NSStringEncoding = 4.

What i know:

  • The user file was created using an application called knowtes
  • The file is opened using TextEdit, TextWranger, etc. on mac os x
  • The file contains "special characters" such as umlauts (rant: why doesn't u have umlaut in umlaut ?!)
  • Finder Info Information:

View: text

text / plain; encoding = UTF-16LE

I assume the encoding of the utf-16le file is the key, since I expect the NSUTF8 file. I tried using ASCII as the lowest common denominator. It did not crash, but is corrupted by some characters that are not in the source file.

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:&errorReading];

So, I tried to convert the file to NSData first, hoping that this might negate the need for encoding recognition. This did not work.

    NSData *txtFileData = [NSData dataWithContentsOfFile:path];
    NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSUTF8StringEncoding];

This leads me to a few questions:

  • Plain Text ( )? , initWithContentsOfFile, , , . ASCIStringEncoding .
  • - NSUTF16 , , NSUTF8?
  • , URF16LE, ?

    NSString *txtFileAsString = nil;
    if (path !=nil) {
      NSData *txtFileData = [NSData dataWithContentsOfFile:path];
      NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSASCIIStringEncoding];
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF8StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16BigEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32BigEndianStringEncoding];
    }}
    
+4
1

stringWithContentsOfFile:usedEncoding:error: (esp, ):

NSError *error;
NSStringEncoding encoding;
NSString *string = [NSString stringWithContentsOfFile:path usedEncoding:&encoding error:&error];

, usedEncoding , encoding.

+3

Source: https://habr.com/ru/post/1598130/


All Articles