Updating rows in SQL every 20 iterations

Question

Updating rows in SQL every 20 iterations

I have a table with ~ 1 million rows in Google BigQuery from the NYC Yellow TaxiCab public dataset . As can be seen from this link, the scheme does not have a primary key. Each row represents a trip / transaction, but there is no field customer_id.

I want to add a column customer_idand distribute random numbers to it so that:

For rows 1-20, `customer_id` should be assigned `1`
For rows 21-40, `customer_id` should be assigned `2`
and so on..

In other words, I want the exact (and any) 20 rows in the table to have a specific meaning customer_id.

+4

sql mysql google-bigquery

Akshay apte 20 sept. '17 at 7:20

source share

2 answers

SQLQuery Standard 20 customer_id

#standardSQL
SELECT DIV(ROW_NUMBER() OVER() - 1, 20) AS customer_id, *
FROM `yourTable`
-- ORDER BY customer_id

#standardSQL
WITH `yourTable` AS (
    SELECT login
    FROM `ghtorrent-bq.ght_2017_04_01.users`
    LIMIT 1000000
)
SELECT DIV(ROW_NUMBER() OVER() - 1, 20) AS customer_id, *
FROM `yourTable`
-- ORDER BY customer_id

- customer_id

#standardSQL
WITH `yourTable` AS (
    SELECT login
    FROM `ghtorrent-bq.ght_2017_04_01.users`
    LIMIT 1000000
)
SELECT cnt, COUNT(1) AS distribution FROM (
  SELECT customer_id, COUNT(1) AS cnt FROM (
    SELECT *, DIV(ROW_NUMBER() OVER() - 1, 20) AS customer_id
    FROM `yourTable`
    ORDER BY customer_id
  )
  GROUP BY customer_id
)
GROUP BY cnt
ORDER BY cnt

Row cnt distribution     
--- --- ------------
1    20        50000

+1

Mikhail Berlyant 20 . '17 16:56

Felipe Hoffa · Accepted Answer · 2017-09-20T07:54:53+0000

Assign a random identifier to each row, obtaining groups of ~ 20 rows for each new_id:

#standardSQL
SELECT CAST(FLOOR(COUNT(*) OVER()/20*RAND()) AS INT64) new_id, *
FROM (
  SELECT login
  FROM `ghtorrent-bq.ght_2017_04_01.users`
  LIMIT 1000000
)

, 50 000 "customers_ids" :

Updating rows in SQL every 20 iterations

More articles: