How to add NOT NULL columns to a large table in SQL Server?

Question

How to add NOT NULL columns to a large table in SQL Server?

To add a NOT NULL column to a table with many records, you must apply the DEFAULT constraint. This limitation causes the entire ALTER TABLE command to take a long time if the table is very large. This is because:

Assumptions:

The DEFAULT constraint modifies existing records. This means that db needs to increase the size of each record, which forces it to move records to full data pages to other data pages and takes time.
The DEFAULT update is performed as an atomic transaction. This means that the transaction log will need to grow so that you can roll back if necessary.
A transaction log keeps track of the entire record. Therefore, although only one field is changed, the space required by the log will be based on the size of the entire record multiplied by the # existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records, even if the total number of records is the same for both tables.

Possible solutions:

Connect it and wait for the process to complete. Just make sure the timeout period is very long. The problem is that depending on the number of records, it may take several hours or days.
Add a column but allow NULL. Then run the UPDATE query to set the DEFAULT value for existing rows. Do not UPDATE *. Update batches of records at a time, or you will have the same problem as solution # 1. The problem with this approach is that you get a column that allows NULL when you know that this is an extra option. I believe that there are some best practice documents that state that you should not have columns that allow NULL if it is not necessary.
Create a new table with the same schema. Add a column to this schema. Transfer data from the source table. Drop the original table and rename the new table. I am not sure if this is better than # 1.

Questions:

Are my assumptions correct?
Are these my only solutions? If so, which one is better? I do not know what else could I do?

+49

sql-server

MrB Nov 13 '08 at 19:28

source share

12 answers

Here is what I would like to try:

Make a full backup of the database.
Add a new column, allowing null - do not set the default value.
Install SIMPLE recovery, which truncates the transaction log as soon as each batch is completed.
SQL: ALTER DATABASE XXX SET RECOVERY SIMPLE
Run the update packages as described above, after each of them.
Reset a new column that no longer allows null.
Return to normal FULL recovery.
SQL: ALTER DATABASE XXX SET RECOVERY FULL
Back up the database again.

Using the SIMPLE recovery model does not stop logging, but significantly reduces its impact. This is because the server deletes the recovery information after each commit.

+3

RoadWarrior Nov 13 '08 at 20:08

source share

You can:

Run the transaction.
Take a write lock on the source table so that no one writes to it.
Create a shadow table with a new schema.
Transfer all data from the source table.
run sp_rename to rename the old table.
run sp_rename to rename the new table.
Finally, you complete the transaction.

The advantage of this approach is that your readers will be able to access the table for a lengthy process and that you can perform any circuit change in the background.

+2

Sam Saffron Nov 14 '08 at 0:07

source share

Just update this with the latest information.

In SQL Server 2012, this can now be performed as an online operation in the following circumstances.

Enterprise Edition only
The default should be a runtime constant

For the second requirement, the examples can be a literal constant or a function, such as GETDATE() , which evaluates to a single value for all strings. By default, NEWID() will not qualify and will still update all rows there and then.

By default, those who qualify SQL Server evaluate them and save the result as the default value in the column metadata, so that it does not depend on the default constraint (which can even be discarded if it is no longer required). This can be viewed in sys.system_internals_partition_columns . The value is not written to the rows until the next time they are updated.

More on this here: online non-zero with value columns added in SQL Server 2012

+2

Martin Smith Jan 07 '13 at 11:10

source share

I think it depends on the SQL flavor you use, but what if you took option 2, but at the very end change the table of the table so that it is not zero by default?

Would it be fast, since he sees that all values are non-zero?

0

Pyrolistical Nov 13 '08 at 19:33

source share

If you need a column in one table, you just need to do it. Now option 3 is potentially the best for this, because you can still use the database live while this operation continues. If you use option 1, the table is locked during the operation, and then you are really stuck.

If you don't care if the column is in the table, I suggest that the segment approach is the best. Although, I really try to avoid this (to the point that I don’t), because, as Charles Brittany says, you need to make sure and find all the places that update / insert this table and modify them. Ugh!

0

Nick DeVore Nov 13 '08 at 23:22

source share

I had a similar problem, and I went for your option number 2. It takes 20 minutes this way, unlike 32 hours the other way !!! A huge difference, thanks for the tip. I wrote a full blog post about this, but here is the important sql:

 Alter table MyTable Add MyNewColumn char(10) null default '?'; go update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000 go update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000 go update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000 go ..etc.. Alter table MyTable Alter column MyNewColumn char(10) not null;

And a blog post if you're interested: http://splinter.com.au/adding-a-column-to-a-massive-sql-server-table

0

Chris Mar 16 '09 at 3:13

source share

I had a similar problem and I went with a modified # 3 approach. In my case, the database was in SIMPLE recovery mode, and no FK restrictions were referenced in the table to which the column was to be added.

Instead of creating a new table with the same schema and copying the contents of the original table, I used the SELECT ... INTO syntax.

According to Microsoft ( http://technet.microsoft.com/en-us/library/ms188029(v=sql.105).aspx )

The scope of the logging for SELECT ... INTO depends on the actual recovery model for the database. As part of a simple recovery model or volumetric recovery model, volumetric operations are minimally recorded. With minimal logging, using a SELECT ... INTO statement can be more than creating a table and then populating the table with an INSERT statement. For more information, see Transactions that may be Minimum Registered.

Sequence of steps:

1. Move data from the old table to the new one by adding a new column with a default value

  SELECT table.*, cast ('default' as nvarchar(256)) new_column INTO table_copy FROM table

2.Wrap the old table

  DROP TABLE table

3. Register the newly created table

  EXEC sp_rename 'table_copy', 'table'

4. Creation of necessary restrictions and indexes in a new table

In my case, the table had more than 100 million rows, and this approach ended faster than approach No. 2, and the growth of journal space was minimal.

0

Tanya Kogan Oct 02 '13 at 15:59

source share

He admitted that this is an old question. Recently, my colleague told me that he was able to do this in one ad on a table alter a table with 13.6M rows. It ended in a second in SQL Server 2012. I was able to confirm the same thing on a table with 8M rows. Has something changed in a later version of SQL Server?

 Alter table mytable add mycolumn char(1) not null default('N');

0

Kenneth Xu May 25 '15 at 18:01

source share

1) Add a column to the table with the default value:

 ALTER TABLE MyTable ADD MyColumn int default 0

2) Periodically update the values in the table (the same effect as the accepted answer). Adjust the number of updated records in your environment to avoid blocking other users / processes.

 declare @rowcount int = 1 while (@rowcount > 0) begin UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL set @rowcount = @@ROWCOUNT end

3) Modify the column definition so that it is not null. Perform the following when the table is not in use (or plan for a few minutes of downtime). I have successfully used this for tables with millions of records.

 ALTER TABLE MyTable ALTER COLUMN MyColumn int NOT NULL

0

hobbsenigma Nov 11 '16 at 17:03

source share

I would use CURSOR instead of UPDATE. The cursor will update all relevant records in batch mode, write by record - it takes time, but does not block the table.

If you want to avoid locks, use WAIT.

Also, I'm not sure if DEFAULT constrain modifies existing rows. Probably NOT NULL restricts its use with DEFAULT in the case described by the author.

If it changes, add it to the end. So, the pseudocode will look like this:

 -- without NOT NULL constrain -- we will add it in the end ALTER TABLE table ADD new_column INT DEFAULT 0 DECLARE fillNullColumn CURSOR LOCAL FAST_FORWARD SELECT key FROM table WITH (NOLOCK) WHERE new_column IS NULL OPEN fillNullColumn DECLARE @key INT FETCH NEXT FROM fillNullColumn INTO @key WHILE @@FETCH_STATUS = 0 BEGIN UPDATE table WITH (ROWLOCK) SET new_column = 0 -- default value WHERE key = @key WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute FETCH NEXT FROM fillNullColumn INTO @key END CLOSE fillNullColumn DEALLOCATE fillNullColumn ALTER TABLE table ALTER COLUMN new_column ADD CONSTRAIN xxx

I am sure there are some syntax errors, but I hope this helps solve your problem.

Good luck

-one

Grzegorz Gierlik Nov 13 '08 at 23:15

source share

Segment the table vertically. This means that you will have two tables with the same primary key and exactly the same number of records ... You will have one of them already, the other will only have the key, and the new column will be Non-Null (with default value ) Change all the inserts, updating and deleting the code so that they synchronize the two tables ... If you want, you can create a view that "joins" the two tables together to create a single logical combination of the two that looks like a single table for the client operators ...

-3

Charles Bretana Nov 13 '08 at 19:34

source share

DHornpout · Accepted Answer · 2009-07-19 23:24

I ran into this problem for my work. And my decision is on number 2.

Here are my steps (I am using SQL Server 2005):

1) Add a column to the table with the default value:

ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')

2) Add a NOT NULL with the NOCHECK option. NOCHECK does not apply existing values:

 ALTER TABLE MyTable WITH NOCHECK ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL)

3) Incrementally update the values in the table:

 GO UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL GO 1000

The update statement will update a maximum of 3,000 records. This allows you to save a piece of data at a time. I have to use "MyColumn IS NULL" because my table does not have a primary key for the sequence.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records if you just need to increase this number. It will continue to run until SQL Server returns 0 records for the UPDATE statement.

How to add NOT NULL columns to a large table in SQL Server?

More articles: