Suggest a method for updating data in many tables with random data?

I have about 25 tables that I would like to update with random data that was selected from a subset of the data. I would like the data to be chosen at random, but significant - like changing all the first names in the database to new names in a random order. Therefore, I do not want random garbage in the fields, I would like to get it out of the temporary table, which is populated ahead of time.

The only way I can do this is with a loop and some dynamic sql.

  • insert pick-in names into temp table with id field
  • foreach table name in tables list:
    • build a dynamic sql that updates all the names for the name randomly selected from the temp table based on rand () * max (id) from the temp table

But at any time, I think that the “loop” in SQL, I suppose I'm doing something wrong.

There are many denormalized tables in this database, so I think I need a loop (the first name fields are scattered across the database).

Is there a better way?

+4
source share
3 answers

Having broken a little the fourth wall, answering my own question.

I tried this as a sql script. What I found out is that SQL pretty much sucks at random. the script was slow and strange - functions that referenced views created only for the script and could not be done in tempdb.

So, I made a console application.

  • Generate your random data, it's easy to make a Random class (just remember to use only one instance of Random).
  • Find out which columns and table names you want to update through a script that looks at INFORMATION_SCHEMA.
  • Get identifiers for all tables that you are going to update, if possible (and wow this will be slow if you have a large table that does not have good PCs).
  • Updating each table 100 rows at a time. Why 100? No idea. Maybe 1000. I just picked a number. The dictionary is convenient here: select a random identifier from a dict using the Random class.

Wash, rinse, repeat. So I updated about 2.2 million rows per hour. Maybe it could be faster, but he did a lot of small updates, so he didn’t get it at all.

0
source

Red Gate has a product called SQL Data Generator that can generate fake names and other fake data for testing purposes. It's not free, but they have a trial version, so you can check it out, and it may be faster than trying to do it yourself.

(Disclaimer: I have never used this product, but I was very pleased with some of their other products.)

+3
source

I wrote a stored procedure to do something like this some time ago. This is not as good as the Red Gate product, and only the names, but if you need something quick and dirty, you can download it from

http://www.joebooth-consulting.com/products/

Script name - GenRandNames.sql

Hope this helps

+1
source

Source: https://habr.com/ru/post/1301511/


All Articles