Vertica JOIN fails because "the internal partition does not fit in memory"

I have a problem with a large query from ten related tables. I am transferring data from the fact table (f1) to the star chart. I start by filling out the dimension tables from f1, and then populating the new fact table (f2) with a connection to the dimension tables to get the corresponding identifiers.

Unfortunately, I get the error, "the internal partition does not fit into memory." From the log I see:

2012-10-18 16:20:31.607 Init Session:0x2aac6c02b250 [EE] <INFO> ENABLE_JOIN_SPILL may allow this query to run, with reduced performance 2012-10-18 16:20:31.607 Init Session:0x2aac6c02b250 [EE] <INFO> Query Retry action: Setting add_vertica_options('EE','ENABLE_JOIN_SPILL'); 

but this will not work either since I get:

 2012-10-18 16:23:31.138 Init Session:0x2aac6c02b250 [EE] <INFO> Join ((public.owa_search_term_dim x public.page_impressions_with_session) using owa_search_term_dim_projection_node0001 and previous join (PATH ID: 7)) inner partition did not fit in memory; value 2012-10-18 16:23:31.138 Init Session:0x2aac6c02b250 [EE] <INFO> Query Retry action: Swapping join order with override: 1|7|0 

This has been going on for some time, while Vertica seems to be trying to find a way to make the connection, but is eventually freed from errors by stating that the connection does not fit into memory.

Are there any tips on how to minimize the amount of memory needed to make connections, or why the drive does not work? I can handle the performance hit, I just need to fulfill the request.

+4
source share
2 answers

What I did to get around this error ...

  • Rewrite request
    Sometimes the initial request is not as optimized as it can be. One way I approach this is to use subqueries.
  • Use temporary tables
    Some of the reports that I had to generate work very well using temporary tables. This is a more extreme version of using subqueries.
  • Additional filters
    Sometimes small things, such as adding additional filters and including them in joined tables, will matter between a 5 minute OOM request and a 30 second working request.
  • Limit data Take a few subsets of data in a few steps. Like additional filters, executing subsets of data reduces the amount of resources that Vertica will use, which allows it to execute successfully. I often do this for date-based aggregation; I make a day-> month-> year. This subset never fails, and I get the exact annual union when simply aggregating the year will never work.
  • Forecasts
    Using project-specific forecasts can help Vertica use less resources.
  • Explain the plan
    These are the two main advantages that I derive from considering the plan of explanation.
    A) Verify that Vertica uses the expected forecasts. For example, request specific forecasts to optimize performance. If I find that this is not the case, I can look at my expectations and assumptions about the request. B) Verify that all tables have maximum filters applied. In some of my more complex subqueries, I found that the Date column was not correctly redirected to all tables. Once I fixed this, the performance was an order of magnitude faster (see above from 5 minutes to 30 seconds).

Using these steps, I did not encounter situations where I could not get the result. Sometimes it takes some time. I have a set of queries pumping into a series of 14 temporary tables that ends with a very small result set; but takes more than 15 minutes to start due to the unprocessed amount of crunch that needs to be done.

+5
source

Nii's answer is the best answer, but here's a suggestion to consider: get more memory. Sometimes you outgrow your system.

His suggestion to use temporary tables is something that I have used in the past, but I have not encountered a problem for quite some time. But this is because our system does not make many connections.

0
source

Source: https://habr.com/ru/post/1440602/


All Articles