We performed 2 queries on two different pairs of tables (several connections for receiving updated data, etc.), the same format (both in the table and in the query) and purpose. The difference between the requests is the string variable contained in the REGEXP_MATCH string (Regex formulas have the same format, only a different kernel string). The rest is exactly the same, except, of course, the data contained.
Despite the fact that the query processed almost 2 GB of data in 20-50 seconds in one pair of tables, the same query with a different REGEX parameter (the same column) processes another pair of tables of 250 MB in 100 seconds (sometimes even From 500 to 1000+ seconds). Both queries are executed interactively without caching the results.
What could be causing this and is there a fix?
How is it possible that a smaller table needs a significant amount of processing time compared to a significantly larger table, given that the queries being performed are basically the same?
Sorry for the mess below, tried to make it as beautiful as possible. So, for a brief briefing: the request intends to create a user funnel based on their events. Real-time data is similar, so we have both updated users and events. The following steps are included:
- Get "done" actions - get users from updated events selected by the first regular expression
- Get actions "don't have to do" - get users from updated events selected by the second event
- Make the difference between 2 with LEFT OUTER JOIN
- USERS ( )
, . .
SELECT Count(*) as count
FROM
(
SELECT final._nid as _nid
FROM (
SELECT did.user as user
FROM (
SELECT events.user as user, events.createdOn as createdOn
FROM [shop1_events] as events
JOIN EACH (
SELECT session, createdOn, MAX(updatedOn) as updatedOn
FROM [shop1_events]
GROUP EACH BY session, createdOn) as latest_events
ON events.session = latest_events.session AND events.createdOn = latest_events.createdOn AND events.updatedOn = latest_events.updatedOn
WHERE ((REGEXP_MATCH(events.category_a , r"([\:\^]100000453[\:\^]|^100000453$|^100000453[\^\:]|[\^\:]100000453$)"))) AND events.type = 10006) as did
LEFT OUTER JOIN EACH (
SELECT events.user as user, events.createdOn as createdOn
FROM [shop1_events] as events
JOIN EACH (
SELECT session, createdOn, MAX(updatedOn) as updatedOn
FROM [shop1_events]
GROUP EACH BY session, createdOn) as latest_events
ON events.session = latest_events.session AND events.createdOn = latest_events.createdOn AND events.updatedOn = latest_events.updatedOn
WHERE ((REGEXP_MATCH(events.category_a , r"([\:\^]100000485[\:\^]|^100000485$|^100000485[\^\:]|[\^\:]100000485$)"))) AND events.type = 10006) as step_not_0
ON did.user = step_not_0.user
WHERE step_not_0.user IS NULL) as funnel
JOIN EACH (
SELECT all._nid as _nid
FROM [shop1_users] as all
JOIN EACH (
SELECT _nid, MAX(updatedOn) as updatedOn
FROM [shop1_users]
GROUP EACH BY _nid) as latest
ON all._nid = latest._nid AND all.updatedOn = latest.updatedOn
) as final
ON final._nid = funnel.user
GROUP EACH BY _nid) as counting;