Are indexes really needed in a data warehouse?

I am a little confused by some GAE docs. Although I intend to add indexes to optimize the performance of my application, I wanted to get some clarification if they are offered only for this purpose or if they are really necessary.

Queries cannot find property values ​​that are not indexed. This includes properties marked as not indexed, as well as properties with a long text value (Text) or a long binary value type (Blob).

A query with a filter or sort order by property will never correspond to an object whose value for the property is text or Blob, or which was written with this property, marked as not indexed. Properties with these values ​​behave as if the property is not set, taking into account the filter and sort order for the query.

from http://code.google.com/appengine/docs/java/datastore/queries.html#Introduction_to_Indexes

The first paragraph makes me think that you simply cannot sort or filter non-indexed properties. However, the second paragraph makes me think that this restriction is limited only to Text or Blob properties or properties specifically annotated as unindexed.

I am interested in learning about the difference because I have numeric and string fields that I am currently sorting / filtering out in a production environment that is not specified. These requests are executed in a background job that basically does not care about performance (most likely, it will be optimized for size / cost in this view). Am I somehow just lucky that they are returning the correct data?

+4
source share
4 answers

In the GAE datastore, single property indexes are automatically created for all properties that are not irreversible (explicitly designated or these types).

The language in this document, I suppose, is a bit confusing.

You need to explicitly specify indexes if you want to index more than one property (for example, to sort by two different properties.)

+6
source

In GAE, unfortunately, if a property is marked as unindexed

num = db.IntegerProperty(required=True, indexed=False) 

Then it cannot be included in the user index ... This is counterproductive (most of the built-in indexes are never used by my code, but take up a lot of space). But that’s how GAE works.

Data Warehouse Indexes - Non-Indexed Properties :

Note. If a property appears in an index consisting of several properties, then its non-conversion will not be indexed in the composite index.

+4
source

you must use an index if you want to use two or more filter functions in a single query.
eg:
Foobar.filter ('foo =', foo) .filter ('bar =', bar)

if you just request one filter, you do not need to use an index that is automatically generated.

for Blob and Text, you cannot generate an index for them, even specifying it in index.yaml, meanwhile you cannot use a filter in them.
eg,
Foobar class (db.Model):
content = db.TextProperty ()
Foobar.filter ('content =', content)
above will cause an error because TextProperty cannot be assigned an index and cannot be matched.

0
source

Never add a model property without EXPLICITLY introducing indexed = True or indexed = False. Indexes take up significant resources: space, record the costs of operations, and latency increases with put () s. We never, never add a property without explicitly specifying its indexed value, even if index = False. It saves costly oversights and makes you think about indexing or not. (At some point, you will find yourself cursing the fact that you forgot to override the default value = True.) GAE engineers could have done a great service by not allowing this to default to True imho. I would simply not provide a default if I were it. NTN. -stevep

0
source

Source: https://habr.com/ru/post/1336094/


All Articles