Why use an XML type to store XML data in SQL Server?

I play and participate in Microsoft SQL Server. I want to store XML documents in a table, parts of the XML document will not be changed inside the table (i.e. Any changes will be made by updating the entire XML document in this cell).

From what I see, I can store XML documents in a column of type Xml or in varchar (MAX).

What are the pros and cons of each?

+6
source share
6 answers

Yes, you can.

Now keep reading the documentation. Part of a better XML search is that you can put an index in an XML field, and this will allow you a lot more XML-specific query syntax than a text field, because the XML fields parse XML internally.

+4
source

XML data type supported:

In addition, using the XML type, it will be more difficult for you to make the typical mistakes that junior developers make when processing XML: threaten them as a string, mix or ignore encodings such as UTF8 and UTF16, ignore namespaces, confuse or ignore processing instructions, etc. d.

Please read the XML Best Practices for Microsoft SQL Server 2005

+6
source

Quoted from the following SO post: Microsoft SQL Server 2005/2008: XML data type and text / varchar

If you store xml in a column with typed xml, the data will not be saved as plain text, as is the case with nvarchar, it will be stored in some kind of parsed data tree, which, in turn, will be smaller than unparsed xml version. This not only reduces the size of the database, but gives you other benefits such as validation, easy manipulation, etc. (although you do not use any of them, nevertheless, they exist for future use).

On the other hand, the server will have to analyze the data on the insert, which is likely to slow down your database - you must decide on the speed and size.

Personally, I believe that data in a database should be stored as xml only when it has a structure that is difficult to implement in a relational model, for example. layouts, style descriptions, etc. This usually means that there will not be much data, and speed is not a problem, therefore, xml functions are added, such as data validation and the ability to manipulate (also, last but not least, the ability to click on a value in the control studio and see formatted xml - I really like this feature!), costs.

I have no direct experience of storing large quantities of xml in and I would not do this if I had an option, since it is almost always slower than the relational model, but if it is, I recommend profiling both options, and choosing between size and the speed that best suits your needs.

+3
source

1. It is based on the standard: SQLXML , so you can expect other major databases to have similar capabilities.

2.Queries can use standards such as XPATH

3.You can index data

4. If you have a scheme for storing data (less), and query optimization is performed based on type information

0
source

Cons: If you store structured XML data in an xml data field, then replication will NOT synchronize changes between publisher and subscriber at this time.

eg. if the subscriber changes the xml element, and the publisher changes the other element of one xml data column, then a conflict arises - one will be played, and you will have to manually find a solution for the missing data.

Pros: Many desktop web applications store their data as xml data types - this can easily be mapped to the sql xml data type.

0
source

I ran several tests to compare insert performance between untyped XML, typed XML, and NVARCHAR (MAX). I found that XML was a post and used the smallest storage on disk. The test I did inserted 7,936,510 lines. He used XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd .

I tested a test XML test twice. The first time took 01: 23: 26.1355961. The second time I took 01:15: 15.5957446. The size of the disk was 57 520 685 056.

The untyped XML test took 00: 48: 48.6290364 and the disk had 36 515 610 624.

The NVARCHAR (MAX) test took 00: 50: 22.1841067 and the disk had 72,620,179,456.

Note. I crashed and recreated a database for each test.

My margin from this is that it is better to use untyped XML instead of NVARCHAR (MAX) because it uses a lot less disk. Maybe if you were to use a non-Unicode VARCHAR, that would be less different. I think it probably uses two bytes to store each character. But, in addition, there are a lot of spaces in the files. Thus, there is a lot of lost storage. So it could have something to do with it.

I'm not sure how much extra slowness associated with using typed XML and untyped XML is due to validation or if there are other differences. If I remember correctly, I once read that data is stored in hidden tables relationally. I'm not sure if this does for typed and untyped XML.

I have not tested query performance yet. I assume this will be faster for typed XML.

In addition, I pointed out that typed XML was DOCUMENT, not CONTENT by default.

0
source

Source: https://habr.com/ru/post/894856/


All Articles