BigQuery - removing rows from a split table

I have a day-by-day table on BigQuery. When I try to delete some rows from a table using a query like:

DELETE FROM `MY_DATASET.partitioned_table` WHERE id = 2374180 

I get the following error:

Error: DML instructions are not yet supported on partitioned tables.

A quick google search leads me to: https://cloud.google.com/bigquery/docs/loading-data-sql-dml , where it also says: "DML statements that modify partitioned tables are not but supported."

So, is there a workaround we can use when deleting rows from a partitioned table?

+5
source share
3 answers

DML has some known issues / limitations at this point.

For instance:

  • DML statements cannot be used to modify tables with REQUIRED fields in their schema.
  • Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement. Multi-operator transaction support is not supported.
  • The following combinations of DML statements are allowed to run simultaneously in a table: UPDATE AND INSERT
    REMOVE AND INSERT
    INSERT AND INSERT
    Otherwise, one of the DML statements will be aborted. For example, if two UPDATE statements are executed simultaneously with a table, then only one of them will succeed.
  • Tables that were recently written using BigQuery streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements. To check if a table has a stream buffer, check the tables.get response for a section called streamingBuffer. If it is missing, the table can be modified using the UPDATE or DELETE statements.
  • DML statements that modify partitioned tables are not yet supported.

Also pay attention to quota limits.

  • Maximum UPDATE / DELETE statements per day for each table: 48
  • Maximum UPDATE / DELETE statements per day per project: 500
  • Maximum INSERT statements per day for each table: 1000
  • Maximum INSERT statements per day per project: 10,000

What you can do is copy the entire section into an unsegmented table and execute the DML instruction there. Then write the temp table to the section. In addition, if you encounter operations limiting the number of DML updates per day per table, you need to create a copy of the table and run DML in a new table to avoid the limit.

+3
source

I already did this without a temporary table , the steps are:

1) prepare a query that selects all the rows from a specific section that you want to save:

 SELECT * FROM `your_data_set.tablename` WHERE _PARTITIONTIME = timestamp('2017-12-07') AND condition_to_keep_rows_which_shouldn't_be_deleted = 'condition' 

if necessary, run this for other sections

2) select the Destination table for the result of your request, in which you indicate TO A DEFINED PERMISSION , you need to specify the name of the table as follows:

 tablename$20171207 

3) Check the option "Overwrite table" β†’ it will overwrite only a specific section

4) Run the request, as a result of the selected section redundant lines will be deleted!

// remember that you may need to run this for other sections, where you delete lines that span multiple sections

0
source

You can delete partitions in partitioned tables using the bq rm command line, for example:

 bq rm 'mydataset.mytable$20160301' 
0
source

Source: https://habr.com/ru/post/1264955/


All Articles