Trying to work with large data sheets

I have an automatic toolbar in the "My dashboard" sheet in Google Drive.

Toolbar data comes from Google Analytics (GA) through the API. Most of the data that I was able to use using Google Sheets GA added.

My initial data for one of the tables in the toolbar is quite large - too large to fit the sheet itself.

So, with some limited scripting skills and with the help of this forum and some online tutorials, I created a script in Google-Apps-Script that requests the GA api and returns all the data I need, and then puts it in a csv file in the same directory as the main panel.

So, now in my "dashboard" folder on Drive, I have 2 files: "my panel" - a sheet and "my data" - a csv file. I could, if I wanted to, instead display the results of calling api on a worksheet, I just assumed that the csv file would be more efficient.

I opened the csv file of my data in Gsheet and called it "combo". Here is an example of how the data looks:

ga:year ga:month ga:medium ga:source ga:campaign ga:goal1Completions 2013 5 (none) (direct) (not set) 116 2013 5 (not set) Adperio silvercontact?dp 0 2013 5 (not set) Conde*it _medium=email 0 2013 5 (not set) Hearst (not set) 0 2013 5 (not set) stackersocial stackersocial 0 2013 5 12111 9591 201fHN000xRGGszT3aEAA11uIsaT000. 0 2013 5 12111 9591 201fHN00BrT.K.AY0Hvf3q1uIqgl000. 0 2013 5 12111 9591 201fHN00CK619oBE3OsBZp1uIqGX000. 0 2013 5 12111 9591 201fHN00DFXJxnUu1jx25M1uIzkC000. 0 

There are ~ 65 thousand rows of data.

Now, in my toolbar, I need a table that groups and aggregates data in a “combo” sheet (or could I use CSV somehow?). My preferred option for moving to a formula is usually, for example,

=sum(filter(ga:goal1Completions, ga:year="2015"... ))

It’s difficult to pull and request this data, and I hope for some tips.

I can’t just import data into my toolbar, as I get a warning about exceeding the maximum sheet size of 200 thousand cells. So I have to rely on a formula to import data and run the calculation every time. Here is an example of what I'm using now:

 =SUM(filter(IMPORTRANGE("https://docs.google.com/spreadsheets/d/xxxxxxxx/edit?usp=sharing","combined!F2:F"), IMPORTRANGE("https://docs.google.com/spreadsheets/d/xxxxxxxx/edit?usp=sharing","combined!A2:A")=year(G$17), IMPORTRANGE("https://docs.google.com/spreadsheets/d/xxxxxxxx/edit?usp=sharing","combined!B2:B")=month(G$17), IMPORTRANGE("https://docs.google.com/spreadsheets/d/xxxxxxxx/edit?usp=sharing","combined!C2:C")="(direct)", IMPORTRANGE("https://docs.google.com/spreadsheets/d/xxxxxxxx/edit?usp=sharing","combined!D2:D")="(none)")) 

So, for each parameter in a function in a cell, I have to import a data column and a subset.

I am sure there must be a better way. Now it really works, but it is exceptionally slow.

The avenue I was thinking about:

  • Someone from SO mentioned using the caching service here . But how will this work or how will I integrate it with my IMPORTRANGE () functions above?
  • Is there any benefit from having a csv file, or should I just output the results to GSheet from the start?
  • In a script, I run to get the data before outputting it to a csv file (or Gsheet, if this is better?) I have the data as an array before converting to a file. Are there any fantasies I can do here, for example, query this array directly from a sheet? Whereas my scripting skills are pretty simple.
  • Presumably, I could create a function to call GA api separately for each cell in which the function is located, returning the results in each individual cell (so, in the above example, a function to call ga api for which year = year (G17) & month = month (G17) & average = [some_other_cell_reference). Thus, this parameter will result in more api calls, but will lead to less data to work with. Not sure if this is a good idea or not in the right direction.

I hope that I have outlined my problem in sufficient detail. I need to find a more efficient way to request my external data in the control panel.

+6
source share
2 answers

Instead of importing one column at a time using multiple IMPORTRANGE and FILTER, use one single IMPORTRANGE to pull out all the columns you need in one step, and use QUERY to perform the necessary operations.

Just like you enable IMPORTRANGE inside FILTER, you can enable it in QUERY

those. For simplicity, I will add only one filter criterion.

 `=QUERY(IMPORTRANGE("key","range"),"select sum(Col6) where Col1 ='"&G$17&'")` 
+2
source

Since the source data is too large for Google Sheets, you can use an alternative data store that will allow you to perform interesting queries - for example, Fusion tables.

You can access Fusion spreadsheets from Google Apps Script using the advanced FusionTables service, which should be enabled first .

In the database abstraction with google script applications , Bruce MacPherson introduces an abstraction layer, which can have many services as basic depots, this will take care of the details of working with FusionTables for you. Even better, you can reorganize your code to first use the cDbAbstraction library with existing Google Sheet content, and then add DriverFusion for a complete dataset later.

+2
source

Source: https://habr.com/ru/post/988855/


All Articles