Delete all rows in all tables that are no longer used in relation to FK

To crop the production database for upload to the test system, we deleted rows in many tables. This has now left us with a crack in several tables, namely with rows that are no longer used in any FK relationship. What I want to achieve is similar to garbage collection in Java.

Or else: If I have M-tables in the database. N of them (i.e. most, but not all) have a relationship with a foreign key. I deleted a couple of high-level rows (i.e. only outgoing FK relationships) through SQL. This leaves rows only in related tables.

Someone has a SQL stored procedure or Java program that finds N tables, and then follows all FK relationships to delete rows that are no longer needed.

If finding N tables is too difficult, I could possibly provide a script with a list of tables to scan or, preferably, a negative list of tables to ignore.

Also note:

  • We have several tables that are used in many (> 50) FK relationships, i.e. A , B , C , ... everyone uses strings in Z
  • All FK relationships use the technical column PK, which is always a single column.
+5
source share
3 answers

Even simple stored procedures are usually a little ugly, and it was an interesting exercise to push stored procedures far beyond where it was easy to get them.

To use the code below, start your MySQL shell, use your target database, insert a large block of stored procedures at the bottom, and then run

 CALL delete_orphans_from_all_tables(); 

to remove all orphaned rows from all tables in your database.

To get a zoom out overview:

  • delete_orphans_from_all_tables - entry point. All other sprocs have the dofat prefix to clearly indicate that they belong to delete_orphans_from_all_tables and make it less noisy to make them kick.
  • delete_orphans_from_all_tables works by pressing dofat_delete_orphans_from_all_tables_iter several times until more rows are deleted.
  • dofat_delete_orphans_from_all_tables_iter works by cyclizing all tables that are foreign key constraint objects, and for each table deleting all rows that are not currently referenced anywhere.

Here is the code:

 delimiter // CREATE PROCEDURE dofat_store_tables_targeted_by_foreign_keys () BEGIN -- This procedure creates a temporary table called TargetTableNames -- containing the names of all tables that are the target of any foreign -- key relation. SET @db_name = DATABASE(); DROP TEMPORARY TABLE IF EXISTS TargetTableNames; CREATE TEMPORARY TABLE TargetTableNames ( table_name VARCHAR(255) NOT NULL ); PREPARE stmt FROM 'INSERT INTO TargetTableNames(table_name) SELECT DISTINCT referenced_table_name FROM INFORMATION_SCHEMA.key_column_usage WHERE referenced_table_schema = ?'; EXECUTE stmt USING @db_name; END// CREATE PROCEDURE dofat_deletion_clause_for_table( IN table_name VARCHAR(255), OUT result text ) DETERMINISTIC BEGIN -- Given a table Foo, where Foo.col1 is referenced by Bar.col1, and -- Foo.col2 is referenced by Qwe.col3, this will return a string like: -- -- NOT (Foo.col1 IN (SELECT col1 FROM BAR) <=> 1) AND -- NOT (Foo.col2 IN (SELECT col3 FROM Qwe) <=> 1) -- -- This is used by dofat_delete_orphans_from_table to target only orphaned -- rows. -- -- The odd-looking `NOT (x IN y <=> 1)` construct is used in favour of the -- more obvious (x NOT IN y) construct to handle nulls properly; note that -- (x NOT IN y) will evaluate to NULL if either x is NULL or if x is not in -- y and *any* value in y is NULL. SET @db_name = DATABASE(); SET @table_name = table_name; PREPARE stmt FROM 'SELECT GROUP_CONCAT( CONCAT( \'NOT (\', @table_name, \'.\', referenced_column_name, \' IN (\', \'SELECT \', column_name, \' FROM \', table_name, \')\', \' <=> 1)\' ) SEPARATOR \' AND \' ) INTO @result FROM INFORMATION_SCHEMA.key_column_usage WHERE referenced_table_schema = ? AND referenced_table_name = ?'; EXECUTE stmt USING @db_name, @table_name; SET result = @result; END// CREATE PROCEDURE dofat_delete_orphans_from_table (table_name varchar(255)) BEGIN -- Takes as an argument the name of a table that is the target of at least -- one foreign key. -- Deletes from that table all rows that are not currently referenced by -- any foreign key. CALL dofat_deletion_clause_for_table(table_name, @deletion_clause); SET @stmt = CONCAT( 'DELETE FROM ', @table_name, ' WHERE ', @deletion_clause ); PREPARE stmt FROM @stmt; EXECUTE stmt; END// CREATE PROCEDURE dofat_delete_orphans_from_all_tables_iter( OUT rows_deleted INT ) BEGIN -- dofat_store_tables_targeted_by_foreign_keys must be called before this -- will work. -- -- Loops ONCE over all tables that are currently referenced by a foreign -- key. For each table, deletes all rows that are not currently referenced. -- Note that this is not guaranteed to leave all tables without orphans, -- since the deletion of rows from a table late in the sequence may leave -- rows from a table early in the sequence orphaned. DECLARE loop_done BOOL; -- Variable name needs to differ from the column name we use to populate it -- because of bug http://bugs.mysql.com/bug.php?id=28227 DECLARE table_name_ VARCHAR(255); DECLARE curs CURSOR FOR SELECT table_name FROM TargetTableNames; DECLARE CONTINUE HANDLER FOR NOT FOUND SET loop_done = TRUE; SET rows_deleted = 0; SET loop_done = FALSE; OPEN curs; REPEAT FETCH curs INTO table_name_; CALL dofat_delete_orphans_from_table(table_name_); SET rows_deleted = rows_deleted + ROW_COUNT(); UNTIL loop_done END REPEAT; CLOSE curs; END// CREATE PROCEDURE delete_orphans_from_all_tables () BEGIN CALL dofat_store_tables_targeted_by_foreign_keys(); REPEAT CALL dofat_delete_orphans_from_all_tables_iter(@rows_deleted); UNTIL @rows_deleted = 0 END REPEAT; END// delimiter ; 

As an aside, this exercise taught me a few things that make code writing this level of complexity using MySQL sprocs a disappointing business. I mention all this just because they can help you, or a curious future reader, understand what looks like a crazy stylistic choice in the code above.

  • A very detailed syntax and template for simple things. eg
    • must be declared and assigned to different lines
    • separators must be set around procedure definitions
    • you must use the combination PREPARE / EXECUTE to use dynamic SQL).
  • Lack of referential transparency :
    • PREPARE stmt FROM CONCAT( ... ); - syntax error, and @foo = CONCAT( ... ); PREPARE stmt FROM @foo; @foo = CONCAT( ... ); PREPARE stmt FROM @foo; - not.
    • EXECUTE stmt USING @foo fine, but EXECUTE stmt USING foo where foo is a procedure variable is a syntax error. Operator
    • A SELECT and a procedure whose last statement is a select statement returns a result set, but pretty much everything you would ever want to do with a result set (like looping over it or checking that something is IN it ) can only target the SELECT , not the CALL statement.
    • You can pass a session variable as an OUT parameter to sproc, but you cannot pass a sproc variable as an OUT parameter to sproc.
  • Completely arbitrary restrictions and bizarre behaviors that hold you back:
    • Dynamic SQL is not allowed in functions, only in procedures
    • Using a cursor to extract from a column into a procedure variable of the same name always sets the variable to NULL , but does not cause any warnings or errors
  • Inability to cleanly pass sets of results between procedures

    Result sets are the basic type of SQL; they return SELECT , and you think of them as objects when using SQL from the application level. But in MySQL sproc, you cannot assign them to variables or pass them from one sproc to another. If you really need this functionality, you need one sproc to write the result set to a temporary table so that another sproc can read it.

  • Bizarre and unfamiliar constructions and idioms:
    • Three equivalent ways to assign a variable are SET foo = bar , SELECT foo = bar and SELECT bar INTO foo .
    • You expect that you should use procedure variables for your entire state and avoid session variables for the same reasons that you avoid global variables in a common programming language. But in fact, you need to use session variables everywhere, because so many language constructs (such as the OUT and EXECUTE parameters) will not take any other kind of variable.
    • The syntax for using the cursor to loop through to the result set just looks alien.

Despite these obstacles, you can still combine small programs like this with sprocs if you decide.

+1
source

This issue is addressed in the MySQL performance blog, http://www.percona.com/blog/2011/11/18/eventual-consistency-in-mysql/

It provides the following meta-request for generating requests that will identify orphaned nodes;

 SELECT CONCAT( 'SELECT ', GROUP_CONCAT(DISTINCT CONCAT(K.CONSTRAINT_NAME, '.', P.COLUMN_NAME, ' AS `', P.TABLE_SCHEMA, '.', P.TABLE_NAME, '.', P.COLUMN_NAME, '`') ORDER BY P.ORDINAL_POSITION), ' ', 'FROM ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ', 'LEFT OUTER JOIN ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ', ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ', 'WHERE ', K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME, ' IS NULL;' ) AS _SQL FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME) AND P.CONSTRAINT_NAME = 'PRIMARY' WHERE K.REFERENCED_TABLE_NAME IS NOT NULL GROUP BY K.CONSTRAINT_NAME; 

I converted it to find childless parents producing;

 SELECT CONCAT( 'SELECT ', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ' ', 'FROM ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ', 'LEFT OUTER JOIN ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ', ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ', 'WHERE ', K.CONSTRAINT_NAME, '.', K.COLUMN_NAME, ' IS NULL;' ) AS _SQL FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME) AND P.CONSTRAINT_NAME = 'PRIMARY' WHERE K.REFERENCED_TABLE_NAME IS NOT NULL GROUP BY K.CONSTRAINT_NAME; 
+5
source

Since I had some weird SQL syntax errors, here is a solution that uses SQL from the accepted answer and Groovy. Use orphanedNodeStatistics() to get the number of nodes for each table to be deleted, dumpOrphanedNodes(String tableName) to delete PK nodes to be deleted, and deleteOrphanedNodes(String tableName) to delete them.

To remove all of them, iterate over the set returned by tablesTargetedByForeignKeys()

 import groovy.sql.Sql class OrphanNodesTool { Sql sql; String schema; Set<String> tablesTargetedByForeignKeys() { def query = '''\ SELECT referenced_table_name FROM INFORMATION_SCHEMA.key_column_usage WHERE referenced_table_schema = ? ''' def result = new TreeSet() sql.eachRow( query, [ schema ] ) { row -> result << row[0] } return result } String conditionsToFindOrphans( String tableName ) { List<String> conditions = [] def query = '''\ SELECT referenced_column_name, column_name, table_name FROM INFORMATION_SCHEMA.key_column_usage WHERE referenced_table_schema = ? AND referenced_table_name = ? ''' sql.eachRow( query, [ schema, tableName ] ) { row -> conditions << "NOT (${tableName}.${row.referenced_column_name} IN (SELECT ${row.column_name} FROM ${row.table_name}) <=> 1)" } return conditions.join( '\nAND ' ) } List<Long> listOrphanedNodes( String tableName ) { def query = """\ SELECT ${tableName}.${tableName}_ID FROM ${tableName} WHERE ${conditionsToFindOrphans(tableName)} """.toString() def result = [] sql.eachRow( query ) { row -> result << row[0] } return result } void dumpOrphanedNodes( String tableName ) { def pks = listOrphanedNodes( tableName ) println( String.format( "%8d %s", pks.size(), tableName ) ) if( pks.size() < 10 ) { pks.each { println( String.format( "%16d", it as long ) ) } } else { pks.collate( 20 ) { chunk -> chunk.each { print( String.format( "%16d ", it as long ) ) } println() } } } int countOrphanedNodes( String tableName ) { def query = """\ SELECT COUNT(*) FROM ${tableName} WHERE ${conditionsToFindOrphans(tableName)} """.toString() int result; sql.eachRow( query ) { row -> result = row[0] } return result } int deleteOrphanedNodes( String tableName ) { def query = """\ DELETE FROM ${tableName} WHERE ${conditionsToFindOrphans(tableName)} """.toString() int result = sql.execute( query ) return result } void orphanedNodeStatistics() { def tableNames = tablesTargetedByForeignKeys() for( String tableName : tableNames ) { int n = countOrphanedNodes( tableName ) println( String.format( "%8d %s", n, tableName ) ) } } } 

( gist )

0
source

Source: https://habr.com/ru/post/1204253/


All Articles