Please note that xD83D is a high surrogate. A surrogate pair consists of a high surrogate and a low surrogate; the presence of two high surrogates next to each other is not a "surrogate pair", this is nonsense.
Also note that the correct way to represent a non-BMP character in XML is the only character reference for a combined character, for example 𒂫 . Separation of a non-BMP symbol into two surrogates is necessary in some character encodings, but it is not needed (or allowed) in XML symbol references. Symbolic links in XML are Unicode code points, not numeric values ββspecific to a particular character encoding.
If you cannot fix the program that created this bad XML, a better solution would be to repair with a script for example. in Perl, which looks for invalid pairs of character references and replaces them with the correct XML representation.
source share