PROJECTION in the Vertica database

Can someone explain to me the concept of PROJECTION in a Vertica database with an example query?

+7
source share
5 answers

Vertica does not use indexes to find data.

Conceptually, you still access tables using SQL. But under the hood, the data in the table is stored in projections that can be optimized for different queries.

I like to think of it as a table representing a deck of cards. If you play poker, you can still say something like

Select * from CardDeck limit 5; 

Suppose you have a table defined with the following columns:

 FaceValue int (let just assume face values are ints), Suit varchar(10) 

Then I can create my projections (I omit the details about splitting, super projections, projections of friends, etc.).

 create projection CardDeck_p1 ( FaceValue ENCODING RLE, Suit ) as select FaceValue, Suit from CardDeck order by FaceValue; create projection CardDeck_p2 ( FaceValue, Suit ) as select FaceValue, Suit from CardDeck order by Suit; 

Now each column can receive a different type of encoding, which is defined in the projection. And a database designer that I have not used many times since I worked on the old version can help you develop forecasts for you.

So, returning to the analogy with the deck of cards, imagine that you want to access the deck of cards, but you need different shuffles of cards. Projections at Vertica give you different mixes. Tables really represent a design that allows you to access data that is stored in projections. But if you write SQL, you get access to tables.

+9
source
+3
source

From the Concepts Guide.pdf (about p. 23) of the Vertica documentation.

Forecasts store data in a format that optimizes query execution. They are similar to materialized representations, because they store the result sets on disk, rather than calculating them every time they are used in a query.

and

Forecasts are transparent to SQL end users. Query Vertica Optimizer automatically selects the best forecasts to use for any query.

All that needs to be done for the forecast to improve query performance is to create a projection. Vertica will automatically select the best projection to use for this request. (Note: for a specific projection, you can request a specific task instead of a table)

I don’t know where your understanding of forecasts is, but more specific questions about forecasts will allow you to work out specific points in more detail. If you want a general overview of the concepts, I would recommend getting and reading the Concepts Guide.pdf. http://my.vertica.com

+1
source

I want to emphasize the point made in response to geoff - projections are physical structures on the disk. Defining multiple predictions for a table can improve query performance, but at the cost of increased disk space and slower loading time (since your rows should be placed in each projection).

There are super-projections in which all the columns in the table are stored, as well as partial projections. You would use a partial projection when the query you want to maintain / optimize requires only a subset of the columns from the table. Each table requires at least one super projection. If you do not define it, Vertica will provide a standard version that may have very poor performance.

The recommended practice is for the Database Designer to help you analyze your table with test data and test queries, after which it can offer you a forecast. I personally have not had great results, so knowing how to use the DBD tool should be part of the curriculum for those who train at Vertica.

+1
source

You seem to be familiar with the opinions. Predictions are in many ways similar to the views in the concept; they both cache something, but at different levels. In short, views cache queries, and predictions cache query results.

Views cache query statements . You name the predefined queries and then call them after that. Queries are not viewed when they are created. When you execute queries using views, they will not get any performance improvement, as these are regular queries.

Predictions cache query results . Projection requests are executed when they are created, and the results are saved during storage. When you execute any query that may use the query result, Vertica will use these predictions to answer the query and, therefore, improve query performance. After creating the forecasts, you don’t need to do anything special; Vertica will automatically select the forecasts if this can benefit the request. Predictions can be used for a query because the query uses the columns of a subset of the forecasts, have the same sort order, etc.

Projected views, you can select a subset of the columns in a table, merge with other tables, sort by specific columns. However, different forecasts will occupy their own space in order to save the query result; the more projections are created, the more space they will be consumed. Forecasts will be updated automatically, while the corresponding source tables are updated. The update process runs in the background and can take a lot of time depending on the complexity of the request and the size of the data. Therefore, forecasts are more suitable for many readings, rather than for many records. In terms of usage, forecasts are more suitable for reporting than real-time web dashboards.

In the details of the implementation of the table in Vertica, everything is logical. All data in tables is stored in each associated super projection. A super projection contains all the columns in the table and is automatically created by default. All other projections are obtained from super projections.

Vertica will decide which forecasts will be used for the query, but you can also specify the name of the projections directly to force Vertica to use them:

 -- List all projections SELECT projection_name FROM projections; -- Force to use super projection, _super is the suffix of the super projection SELECT * FROM FACT_TABLE_super; 

You can use the explanation instruction to see which forecasts are used in the query plan. This will help you improve the performance of your request.

0
source

Source: https://habr.com/ru/post/913543/


All Articles