Sequentially Generate GUIDs in Java with SQL Server Unique Identifier

The problem I'm trying to solve is this:

1 - In our database, we have all tables (also tables with millions of records) with a PK identifier column declared as VARCHAR (36). It also has a clustered index . Of course, since I read online, this is terrible for performance, also because there are a lot of reads, inserts, updates, and deletes in db.

2 - We use Hibernate for our java web application as an ORM for this db

After reading extensively on the Internet, I started changing the data type of these columns to UNIQUEIDENTIFIER with the default option newsequentialid () , since this parameter should mitigate the fragmentation problems of our indexes.

What I noticed was that the fragmentation problem persisted, the tables became very fragmented after the rebuilds (we rebuild the full index every night).

Then I saw that all of our Hibernate mappings for id columns consist of this:

<id name="id" column="id" type="string"> <generator class="guid"/> </id> 

When an insertion occurred in our system, the log showed that the insertion was performed after calling select newid() , since this returns a random guide, the insertion will be placed at a random point in the index, which will lead to fragmentation (this completely defeats the change in the column data type, which I also did).

So, after another online search, I tried to implement the guide generator in Hibernate myself, introducing the IdentifierGenerator interface and using the JUG time generator ( http://wiki.fasterxml.com/JugHome ).

Code that generates an identifier (I thought it was sequential):

 String uuid = null; EthernetAddress nic = EthernetAddress.fromInterface(); TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(nic); uuid = uuidGenerator.generate().toString(); 

And I changed the mapping accordingly to this:

 <id name="id" column="id" type="string"> <generator class="my_package.hibernate.CustomSequentialGuidGenerator"> </generator> </id> 

Then I tried to generate some test uuids to check their sequence (sequential in the uniqueidentifier method, therefore binary), this is a short list (each element is created before the serial):

 314a9a1b-6295-11e5-8d2c-2c27d7e1614f 3d867801-6295-11e5-ae09-2c27d7e1614f 4434ac7d-6295-11e5-9ed1-2c27d7e1614f 491462c4-6295-11e5-af81-2c27d7e1614f 5389ff4c-6295-11e5-84cf-2c27d7e1614f 57098959-6295-11e5-b203-2c27d7e1614f 5b62d144-6295-11e5-9883-2c27d7e1614f 

It looks like alphabetically sequential, but not binary serial.

The above test was performed using a sevenfold test application, it was not a cycle.

I tried to insert these values ​​into a column declared as a unique identifier, and after issuing a selection in this column, these are the outputs of the sql-list server:

 5389FF4C-6295-11E5-84CF-2C27D7E1614F 314A9A1B-6295-11E5-8D2C-2C27D7E1614F 5B62D144-6295-11E5-9883-2C27D7E1614F 4434AC7D-6295-11E5-9ED1-2C27D7E1614F 3D867801-6295-11E5-AE09-2C27D7E1614F 491462C4-6295-11E5-AF81-2C27D7E1614F 57098959-6295-11E5-B203-2C27D7E1614F 

So I really don’t understand what I should do, and if I can use the JUG as a serial key generator to avoid fragmentation problems.

This is another JUG test, I tried 3 runs each time, generating 10 commands using a loop:

Run 1

 54bd156e-62a2-11e5-a1a7-2c27d7e1614f 54c3cc2f-62a2-11e5-a1a7-2c27d7e1614f 54caf820-62a2-11e5-a1a7-2c27d7e1614f 54d1aee1-62a2-11e5-a1a7-2c27d7e1614f 54d901e2-62a2-11e5-a1a7-2c27d7e1614f 54df9193-62a2-11e5-a1a7-2c27d7e1614f 54e64854-62a2-11e5-a1a7-2c27d7e1614f 54ecff15-62a2-11e5-a1a7-2c27d7e1614f 54f3b5d6-62a2-11e5-a1a7-2c27d7e1614f 54fa4587-62a2-11e5-a1a7-2c27d7e1614f 

Run 2

 87c66bcc-62a2-11e5-8e7c-2c27d7e1614f 87ccd46d-62a2-11e5-8e7c-2c27d7e1614f 87d3641e-62a2-11e5-8e7c-2c27d7e1614f 87d97e9f-62a2-11e5-8e7c-2c27d7e1614f 87e05c70-62a2-11e5-8e7c-2c27d7e1614f 87e6ec21-62a2-11e5-8e7c-2c27d7e1614f 87ed7bd2-62a2-11e5-8e7c-2c27d7e1614f 87f40b83-62a2-11e5-8e7c-2c27d7e1614f 87fac244-62a2-11e5-8e7c-2c27d7e1614f 880103d5-62a2-11e5-8e7c-2c27d7e1614f 

Run 3

 a4b690db-62a2-11e5-b667-2c27d7e1614f a4bcd26c-62a2-11e5-b667-2c27d7e1614f a4c2eced-62a2-11e5-b667-2c27d7e1614f a4c92e7e-62a2-11e5-b667-2c27d7e1614f a4cf48ff-62a2-11e5-b667-2c27d7e1614f a4d5d8b0-62a2-11e5-b667-2c27d7e1614f a4dc6861-62a2-11e5-b667-2c27d7e1614f a4e34632-62a2-11e5-b667-2c27d7e1614f a4e9d5e3-62a2-11e5-b667-2c27d7e1614f a4f101d4-62a2-11e5-b667-2c27d7e1614f 

Run 4

 c2b872b2-62a2-11e5-b855-2c27d7e1614f c2c17363-62a2-11e5-b855-2c27d7e1614f c2c82a24-62a2-11e5-b855-2c27d7e1614f c2ce92c5-62a2-11e5-b855-2c27d7e1614f c2d57096-62a2-11e5-b855-2c27d7e1614f c2dc2757-62a2-11e5-b855-2c27d7e1614f c2e32c38-62a2-11e5-b855-2c27d7e1614f c2e9bbe9-62a2-11e5-b855-2c27d7e1614f c2f099ba-62a2-11e5-b855-2c27d7e1614f c2f7507b-62a2-11e5-b855-2c27d7e1614f 

Run 5

 f0263d1b-62a2-11e5-8529-2c27d7e1614f f02d1aec-62a2-11e5-8529-2c27d7e1614f f033d1ad-62a2-11e5-8529-2c27d7e1614f f03a615e-62a2-11e5-8529-2c27d7e1614f f041181f-62a2-11e5-8529-2c27d7e1614f f047a7d0-62a2-11e5-8529-2c27d7e1614f f04dc251-62a2-11e5-8529-2c27d7e1614f f05403e2-62a2-11e5-8529-2c27d7e1614f f05a6c83-62a2-11e5-8529-2c27d7e1614f f0608704-62a2-11e5-8529-2c27d7e1614f 

Startup 6 (started up again from 0)

 00fd4ec3-62a3-11e5-8ab8-2c27d7e1614f 01042c94-62a3-11e5-8ab8-2c27d7e1614f 010b3175-62a3-11e5-8ab8-2c27d7e1614f 0111e836-62a3-11e5-8ab8-2c27d7e1614f 0118ed17-62a3-11e5-8ab8-2c27d7e1614f 011fcae8-62a3-11e5-8ab8-2c27d7e1614f 0126a8b9-62a3-11e5-8ab8-2c27d7e1614f 012d115a-62a3-11e5-8ab8-2c27d7e1614f 0133c81b-62a3-11e5-8ab8-2c27d7e1614f 013a30bc-62a3-11e5-8ab8-2c27d7e1614f 

Single groups in alphabetical order (but not binary) are ordered and take different runs in general, they are not events in alphabetical order (sigh).

What am I missing?

************************** EDIT - description of my implementation ********* *********

After various comments and answers, I implemented the following strategy:

I created my own sequences (based on the current timestamp), and this is the generator class:

 package it.hibernate; import java.io.Serializable; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.commons.lang.RandomStringUtils; import org.hibernate.HibernateException; import org.hibernate.engine.SessionImplementor; import org.hibernate.id.IdentifierGenerator; public class CustomSequentialGuidGenerator implements IdentifierGenerator{ @Override public Serializable generate(SessionImplementor session, Object object) throws HibernateException { String uuid = null; try { Date data = new Date(); SimpleDateFormat sdf = new SimpleDateFormat(); String rand = RandomStringUtils.randomAlphanumeric(12); sdf.applyPattern("yyyy"); String year = sdf.format(data); sdf.applyPattern("MM"); String month = sdf.format(data); sdf.applyPattern("dd"); String day = sdf.format(data); sdf.applyPattern("HH"); String hour = sdf.format(data); sdf.applyPattern("mm"); String mins = sdf.format(data); sdf.applyPattern("ss"); String secs = sdf.format(data); sdf.applyPattern("SSS"); String millis = sdf.format(data); //G carachter is used to insert the rows after uuid = "GG" + year + month + "-" + day + hour + "-" + mins + secs + "-" + "0" + millis + "-" + rand; } catch (Exception exception) { exception.printStackTrace(); } return uuid; } } 

You may notice that all lines start with the line 'GG' , because I had to make sure that all new lines will be inserted AFTER all old lines generated with select newid() . After that, the current timestamp and 12 random characters to avoid collision in the case of inserting several lines in the same millisecond.

After test 2000, insert fragmentation of the primary key index decreased from 17.92% to 0.15%.

NB Data type reentry is obviously varchar (36), not a unique identifier, so strings are sorted alphabetically.

+5
source share
1 answer

The standard newsequentialid() option, of course, did not work, because hibernate does not use the default value, it always sets the value returned by its generator.

If you take a quick look at the JUG library, then, apparently, it does not offer any means of generating GUIDs sequentially. I don’t know why you thought that the generator generate() method obtained through Generators.timeBasedGenerator() would give you consistent GUIDs. A time-based generator is simply a generator that takes into account the current time when generating the GUID, but it can manipulate the current time coordinate in any way that it considers necessary when embedding in the GUID, so it does not guarantee to be consistent with respect to the received GUIDs.

As a rule, the terms "GUID" and "sequential" are incompatible with each other. You can either have keys that are GUIDs or keys that are sequential, but under normal circumstances you cannot have both.

So, are you sure the keys should be GUID? Personally, I find the GUID very hard to work with.

But if you have to do any hacks necessary in order to have consecutive GUIDs, then my recommendation would be to write your own function that generates 36-character strings that look like GUIDs but are consecutive.

The sequential part must come from SEQUENCE , which simply yields consecutive integers. (I believe MS-SQL-Server supports them.)

You can read the IETF UUID specification on how to build a GUID correctly, but you do not need to follow it in the letter. For the most part, if it just looks like a GUID, it is good enough.

If you can have one global sequence for this, that’s good. If you cannot have a single global sequence, you need to somehow identify your sequences, and then consider the identifier of each sequence when creating your GUIDs. (This will be the "node id" mentioned in the IETF documentation.)

I once had an unreasonable requirement that the lines that I had to pass to a certain web service should be identified using GUIDs, and there was too much red tape that prevented me from contacting them to ask them: "You are frigins “Seriously?” So I just passed the GUIDs like this:

 |--- random part -----| |-- key ---| 314a9a1b-6295-11e5-8d2c-000000000001 314a9a1b-6295-11e5-8d2c-000000000002 314a9a1b-6295-11e5-8d2c-000000000003 314a9a1b-6295-11e5-8d2c-000000000004 314a9a1b-6295-11e5-8d2c-000000000005 ... 

They did not say a word.

+5
source

Source: https://habr.com/ru/post/1232178/


All Articles