Narrowing data types on a very large table

Question

Narrowing data types on a very large table

I have a table with over 150 million records. The following types currently exist:

id (Primary Key, Bigint) idResult (Foreign key, Bigint null) idItem (Foreign Key, Bigint null) Number_1 (Bigint null) Number_2 (Bigint null) IsActive (Bigint null)

Number_1 and Number_2 can never be greater than 10. IsActive, clearly logical. And none of these columns are null anywhere in the code base. I would also like to change the foreign key fields to int, but this is another story. This table was built a few years before I started, and we experience some growing pains.

I am looking for the best way to convert these columns (and several others to other tables, although this is one of the main violators), and restore this disk space. I tried the direct Alter Table , but this, as expected, just didn't work. I cannot recall the error specifically, but I believe that it was related to the size of the table in question.

I'm currently considering manually deleting and re-creating the table, but I'm struggling to find a better way to do this, except with Select TOP 10000 * FROM dbo.TABLENAME WHERE id > 0 , and just increment the where clause several times.

I looked at Switch To , but this requires that the data types of the target table match the original table, and this is what I'm trying to fix!

Any suggestions? Am I considering this wrong?

+1

sql-server tsql sql-server-2012

anick Nov 24 '14 at 21:33

source share

2 answers

Do not drop

Create a table with the appropriate column types and then insert into the loop

Or you can do it in a column at a time.

 Update table set Number_1tiny = Number_1 where Number_1 is not null and Number_1tiny <> Number_1

+1

paparazzo Nov 24 '14 at 10:03

source share

Solomon rutzky · Accepted Answer · 2014-11-25T05:40:32+0000

First of all, thanks for that. This is such an obvious victory that many will not see much value, but it's worth it :). Make the world so soft

Regarding IsActive being logical. I assume you are thinking of making this a BIT field. It may be a way to go, but sometimes it is better to go with TINYINT , since it is possible to expand the value to more than 2 states. In this case, it really becomes more StatusID . Usually this is an example of something that starts simplistically, like Active / Inactive , but later, possibly Deleted and / or others. In terms of size, TINYINT always 1 byte. BIT , on the other hand, is 1 byte for fields up to 8 BIT . Value, one BIT field - 1 byte, 2 BIT - also one byte, and so on up to 8 BIT fields stored in one byte. Thus, there is no space saving by choosing BIT over TINYINT when the table has only 1 BIT . Just to think.

Doing ALTER TABLE is a bit for a large table, as you saw. One option, although not excellent, is to add a NOT NULL field - Number_1new - with a DEFAULT value (this will be instantly due to the default value, at least starting from SQL 2012), which none of them naturally would have (e.g. 255) and then the values in the cycle slowly migrated, as in:

 UPDATE TOP (5000) tab SET tab.Number_1new = tab.Number_1 FROM [table] tab WHERE tab.Number_1new = 255;

And when this is done, do:

 sp_rename 'table.Number_1', 'Number_1old', 'COLUMN'; sp_rename 'table.Number_1new', 'Number_1', 'COLUMN';

Of course, it is best to wrap this in a transaction, and it is wrapped in TRY / CATCH. When the related code has been updated and everything has been tested and the data looks good, you can reset the Number_1old column.

However, the best way I found is to create a new table, slow data transition, then exchange tables and code at the same time. I described the steps in detail in an article on SQL Server Central: "Restore tables to 100 million rows (or more) in seconds." Srsly! (free registration required). Just in case, if there are problems with this article, here are the basic steps:

Create a new table with a perfect structure - [tableNew]. If you are using Enterprise Edition, consider enabling ROW or PAGE compression, as they can sometimes help. But first, do some research, as there is some situation where they have a negative effect. There is MSDN documentation to help you understand it, as well as some tools to help you evaluate potential savings. But even if you enable compression, I will not see this action as a replacement for the project that you are doing here.
Add AFTER UPDATE, DELETE trigger to [table] to keep changes in sync (but no need to worry about new lines)
Create an SQL agent job that moves through the missing lines in packages. Do this in a loop that executes INSERT INTO [tableNew] (Columns) SELECT TOP (n) Columns FROM [table] WHERE ?? ORDER BY ?? INSERT INTO [tableNew] (Columns) SELECT TOP (n) Columns FROM [table] WHERE ?? ORDER BY ??
The WHERE and ORDER BY clauses depend on the situation. They should be aimed at making the best use of the clustered index. If the clustered index of the new table structurally matches the old / current table, then at the beginning of each cycle you can get MAX ([id]) from [tableNew] and use it to get the WHERE table.[id] > @MaxIdInTableNew ORDER BY table.[id] .
Create a new table, a trigger in the current table, and SQL Agent Job for a week or so before you need to complete a full break. This time interval may vary depending on your situation, but just make sure you have a lot of time. It is much better for the work to complete the row migration and only a few drops at a time, rather than the 100k shy of the full set, as the release should begin.
If the plan is to migrate other related tables (PK links for two FKs that you want to turn into INT s), then do these fields here INT now and just don’t add FK until these other tables are migrated until INT fields are present as your PC. You do not want to restore this table again to make changes to the FK fields.
During clipping (of course, in TRY / CATCH):
- BEGIN TRAN
- make a final row count for both tables to make sure everything is moved (you may need to check the health of the rows before release to make sure that the trigger made updates and deleted as expected)
- rename the current table to "old"
- rename the "new" table so that there is no "new"
- delete the SQL agent job (or at least disable it)
- rename and dependent objects, such as restrictions, etc.
- COMMIT

Narrowing data types on a very large table

More articles: