BiqQuery vs Google Analytics, which data is more accurate?

As the best Google Analytics / BigQuery client, our question is: What data is more accurate?

I tend to tend to make BigQuery more accurate, because we can see the raw data, but we don’t know how Google Analytit uses the method to calculate its numbers.

I also think a lot of this has to do with SAMPLING.

When you calculate something as simple as Total Pageviews for a single page, Google Analytics numbers line up to BigQuery within .00001% :

  sum (case when regexp_match (hits.page.pagepath, r '(? i: /contact.aspx)') and hits.type = "page" then 1 else 0 end) as total_pageviews 

When you compute something more complex, like unique pages for a single page, Google Analytics numbers are 5% greater than BigQuery. Note that this is a sample for a maximum of 1 million:

  count (distinct (case when regexp_match (hits.page.pagepath, r '(? i: /contact.aspx)') and hits.type = "page" then concat (fullvisitorid, string (visitid)) end), 1000000) as unique_pageviews 

I’d like to know what others think or what Google developers themselves can explain.

+6
source share
2 answers

If you are a premium customer, I assume you have a great look with lots of data. The Google Analytics API will display your data, if there is a lot of them, this is something you can try and prevent by putting the sampling level up. Even if the sampling level is set to high precision, you will still receive sample data from the API.

Check that Json is returning from the API, it will tell you if your data will be fetched.

Big Query will not check your data if premium clients have the ability to use the API without sample data, but I think you need to contact Google to configure this.

The big point in Big Queries is that you are not limited to 7 parameters and 10 metrics, such as the Google Analytics API.

Note. I am not a Google developer, but I am a Google Developer expert for Google Analytics.

+3
source

I am a big fan of BigQuery. I also used Google Analytics quite a lot. So the question is where the data is more accurate.

Well, the answer to such a question is always: "the data is more accurate, the closer to where it comes from." BigQuery is the primary repository of all Google data. This is where data is collected, indexed, and then made available through the SQL interface.

Google Analytics is a tool that has been developed with many free accounts in mind. To support free GA accounts, you need to scale well. To scale, companies optimize storage by pre-aggregating data.

So, you are really comparing two things: pre-aggregated / pre-aggregated data (GA) and unprocessed accumulated data (BigQuery). Why would you trust?

Now, it seems, there is a second question: "How to get accurate aggregates from BigQuery?" BigQuery is filled with ANSI SQL incompatibility, which is hard to remember for special queries. You are better off hooking a BI tool on top of BigQuery so you can examine the data one by one (i.e. the same threshold / rounding).

+1
source

Source: https://habr.com/ru/post/976838/


All Articles