This seems somewhat significant . To simplify, there are several ways to get the same text in Unicode (and therefore UTF8): for example, this: řcan be written as one character řor as two characters: rand a union ˇ.
The normalizer class is best - normalize both lines in the same normalization form and compare the results.
In one of the comments, you show these hexadecimal representations of strings:
4d696e61205469646967617265 20 616e7374 c3a4 6c6c6e696e676172
4d696e61205469646967617265 c2a0 616e7374 61cc88 6c6c6e696e676172
^^-----------------^^^^1 ^^^^^^2
Pay attention to the parts noted by me, apparently, there are two parts to this problem.
-, "c2a0" - - XML . , "". , PHP, .
, : c3a4 ä (U + 00E4 "LATIN SMALL LETTER A WITH DIAERESIS" - , ), 61 a (U + 0061 "LATIN SMALL LETTER A" - , ) cc88 umlaut " (U + 0308 " " - , ). .