UUID cassandra sorting?

Suppose I have a family of custom columns with a unique name + preset for a specific client

<?php uniqid ("serverA");//generate something like; serverA4b3403665fea6 ?> 

I can select them by secondary indexes, etc., for example: (example of birth from phpcassa)

 $column_family = new ColumnFamily($conn, 'Indexed1'); $index_exp = CassandraUtil::create_index_expression('birthdate', 1984); $index_clause = CassandraUtil::create_index_clause(array($index_exp)); $rows = $column_family->get_indexed_slices($index_clause); // returns an Iterator over: // array('winston smith' => array('birthdate' => 1984)) foreach($rows as $key => $columns) { // Do stuff with $key and $columns Print_r($columns) } 

However, I only want the request to have the 30 most recently added users (created keys) per page and multi-page layout, each page with older keys

The only option I found now is using uuid from phpcassa

uuid1() generates a UUID based on the current time and the MAC address of the machine.

  • Pros: Useful if you want to be able to sort your UUIDs by creation time.
  • Cons: a potential data leak, as it shows on which computer it was created and at what time.
  • Collisions are possible: if two identical UUIDs are generated at the same time (within 100 ns) on one computer. (Or several other unlikely marginal cases.)

uuid2() no longer used.

uuid3() generates a UUID by accepting an MD5 hash of an arbitrary name that you select within a certain namespace (for example, URL, domain name, etc.).

  • Pros: Provides a good way to assign UUID blocks to different namespaces. Easy to play UUID on behalf of.
  • Cons: If you already have a unique name, why do you need a UUID?
  • Conflicts are possible: if you reuse the name in the namespace or if there is a hash collision.

uuid4() generates a completely random UUID.

  • Pros: privacy issues. No need to create unique names.
  • Cons: no structure for UUID.
  • Collisions are possible: if you use a random number generator, use random seed or very, very unsuccessful.

uuid5() same as uuid3() , except for using the SHA-1 hash instead of MD5. Officially preferable to uuid3() .

But this means that I have to rewrite some parts + to get the probability of a collision.

Are there any smart hacks that I haven't thought about?

+4
source share
1 answer

First, with respect to UUIDs, you don’t need to worry about collisions if you plan to use either uuid1 () or uuid4 () (these are the only ones that are really used anyway). The probability of such an event is astronomically low. Do not worry about it.

To get the 30 most recently added keys (along with paging capabilities) you are really talking about time series data. Here's a good introduction to the timeseries with Cassandra . You can either use timestamps or v1 UUID as column names, and unique keys are column values. If you decide to use v1 UUID for unique keys, you can simply put them directly in the column names. At this point, you are simply dealing with regular time series data and paging in Kassandra.

+3
source

Source: https://habr.com/ru/post/1382756/


All Articles