The first three bytes are skipped because the RTL code assumes the file contains the UTF-8 specification. Obviously your file does not.
The TUTF8Encoding class implements the GetPreamble method, which sets the UTF-8 specification. And ReadAllBytes skips the preamble indicated by the encoding you are passing.
One simple solution would be to read the file into a byte array, and then use TEncoding.UTF8.GetString to decode it to a string.
var Bytes: TBytes; Str: string; .... Bytes := TFile.ReadAllBytes(FileName); Str := TEncoding.UTF8.GetString(Bytes);
A more comprehensive alternative would be to create an instance of TEncoding that ignores the UTF-8 specification.
type TUTF8EncodingWithoutBOM = class(TUTF8Encoding) public function Clone: TEncoding; override; function GetPreamble: TBytes; override; end; function TUTF8EncodingWithoutBOM.Clone: TEncoding; begin Result := TUTF8EncodingWithoutBOM.Create; end; function TUTF8EncodingWithoutBOM.GetPreamble: TBytes; begin Result := nil; end;
Create one of them (you only need one instance for each process) and pass it to TFile.ReadAllText .
The advantage of using a single instance of TUTF8EncodingWithoutBOM is that you can use it anywhere TEncoding is expected.
source share