Delete rows in Firebase Analytics for export to BigQuery with less popular values

Before sharing the Firebase Analyics table in BigQuery with third parties, I need to delete the lines with cities that appear in less than 5 lines. How can I do this without losing the structure of the table?

Tasks:

  • BigQuery Firebase Analytics data has nested rows, and I don’t want to lose the nested structure.
  • Some cities in different regions have the same name. This means that when counting, I must use at least two fields (city, region).
  • Sometimes a city and / or region may be empty. I do not want to lose these lines.
+4
source share
2 answers

:

SELECT *,
  IFNULL(user_dim.geo_info.city,'_')+IFNULL(user_dim.geo_info.region,'_') cityregion, 
FROM [dataset.app_events_20160607]
HAVING cityregion NOT IN (
  SELECT cityregion FROM (
    SELECT COUNT(*) c, IFNULL(user_dim.geo_info.city,'_')+IFNULL(user_dim.geo_info.region,'_') cityregion
    FROM [dataset.app_events_20160607]
    GROUP BY 2
    HAVING c<6
  )
)

:

:

  • SELECT *, [...] cityregion cityregion.
  • IFNULL(..., '_') replaces null values with _`, , .
  • HAVING cityregion NOT IN cityregion, , .
  • - , 6 .
+3

""

, - SQL

SELECT *
FROM `dataset.app_events_20160607`
WHERE CONCAT(IFNULL(user_dim.geo_info.city,'_'), IFNULL(user_dim.geo_info.region,'_')) NOT IN (
  SELECT cityregion FROM (
    SELECT COUNT(*) c, CONCAT(IFNULL(user_dim.geo_info.city,'_'), IFNULL(user_dim.geo_info.region,'_')) cityregion
    FROM `dataset.app_events_20160607`
    GROUP BY 2
    HAVING c<6
  )
)
+2

Source: https://habr.com/ru/post/1649587/


All Articles