I ran several tests to compare insert performance between untyped XML, typed XML, and NVARCHAR (MAX). I found that XML was a post and used the smallest storage on disk. The test I did inserted 7,936,510 lines. He used XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd .
I tested a test XML test twice. The first time took 01: 23: 26.1355961. The second time I took 01:15: 15.5957446. The size of the disk was 57 520 685 056.
The untyped XML test took 00: 48: 48.6290364 and the disk had 36 515 610 624.
The NVARCHAR (MAX) test took 00: 50: 22.1841067 and the disk had 72,620,179,456.
Note. I crashed and recreated a database for each test.
My margin from this is that it is better to use untyped XML instead of NVARCHAR (MAX) because it uses a lot less disk. Maybe if you were to use a non-Unicode VARCHAR, that would be less different. I think it probably uses two bytes to store each character. But, in addition, there are a lot of spaces in the files. Thus, there is a lot of lost storage. So it could have something to do with it.
I'm not sure how much extra slowness associated with using typed XML and untyped XML is due to validation or if there are other differences. If I remember correctly, I once read that data is stored in hidden tables relationally. I'm not sure if this does for typed and untyped XML.
I have not tested query performance yet. I assume this will be faster for typed XML.
In addition, I pointed out that typed XML was DOCUMENT, not CONTENT by default.
source share