VARCHAR compression in SQL 2008/12 - no results

I experimented with compression in SQL Server, but so far I have not seen the expected results.

To check, I created a new table with a single VARCHAR(8000) column VARCHAR(8000) and inserted 100k rows into it. Each line contains about 500 words of text, which, using ZIP compression, see more than 90% space savings.

I use the EXEC sp_estimate_data_compression_savings 'dbo', 'MyTable', NULL, NULL, 'PAGE' ; command EXEC sp_estimate_data_compression_savings 'dbo', 'MyTable', NULL, NULL, 'PAGE' ; to check how much space will be saved using PAGE compression, but this tells me that this will not happen. The results are as follows:

 object_name schema_name index_id partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB) sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB) MyTable dbo 0 1 94048 93440 40064 39808 

Basically this is not a saving. Where am I mistaken?

ps. I tried the same experiment with the NVARCHAR(4000) column, and compression there shows savings there, but I believe that this is due to the fact that forced compression uses 1 char instead of two, where the data does not require two characters. It actually does not compress the data in a way similar to a ZIP.

+4
source share
2 answers

If the data is pushed off the line (which is likely to happen in the VARCHAR(8000) column), you will not get any compression. Only compressed data per line :

Because of their size, data types of large value are sometimes stored separately from regular row data on special purpose pages. Data compression is not available for data that is stored separately.

+2
source

Page compression in SQL Server uses prefix and dictionary methods to compress data. It cannot (and you do not want to) look at the entire data set to find out the best compression. It can only view the data page at a time. The best results are achieved when each subsequent line on the page differs by the smallest sum from the previous lines. The only way to achieve this is to force the SQL server to physically arrange the rows on each page so that they differ less from row to row. We can do this by creating a clustered index in a field or a set of fields that ensures that the physical arrangement of data rows will least change from row to row model.

In the example that you indicated, a bunch of words in one field, a suitable degree of compression may be achievable. This sounds like paragraphs of text and will be very different, regardless of how they are physically arranged.

The method used by SQL Server to compress data allows you to retrieve the contents of any row without having to unpack the entire page.

+1
source

Source: https://habr.com/ru/post/1402480/


All Articles