Need to insert 100,000 rows in mysql using sleep mode in less than 5 seconds

I am trying to insert 100,000 rows in a MYSQL table for less than 5 seconds using Hibernate (JPA). I have tried all the suggestions of a sleeping trick and still can not do better than 35 seconds.

1st optimization: I started with the IDENTITY sequence generator, which inserted 60 seconds. I later abandoned the sequence generator and started assigning the @Id field myself, reading MAX(id) and using AtomicInteger.incrementAndGet() to assign the fields myself. This reduced the insertion time to 35 seconds.

Second optimization: I enabled batch inserts by adding

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

to configuration. I was shocked to find that batch insertions did nothing to reduce insertion time. It was another 35 seconds!

Now I'm thinking of trying to insert multiple threads. Does anyone have pointers? Should I choose MongoDB?

The following is my configuration: 1. Sleep configuration`

 <bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="packagesToScan" value="com.progresssoft.manishkr" /> <property name="jpaVendorAdapter"> <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" /> </property> <property name="jpaProperties"> <props> <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <prop key="hibernate.show_sql">${hibernate.show_sql}</prop> <prop key="hibernate.format_sql">${hibernate.format_sql}</prop> <prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop> </props> </property> </bean> <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource" id="dataSource"> <property name="driverClassName" value="${database.driver}"></property> <property name="url" value="${database.url}"></property> <property name="username" value="${database.username}"></property> <property name="password" value="${database.password}"></property> </bean> <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager"> <property name="entityManagerFactory" ref="entityManagerFactoryBean" /> </bean> <tx:annotation-driven transaction-manager="transactionManager" /> 

`

  1. Object configuration:

`

 @Entity @Table(name = "myEntity") public class MyEntity { @Id private Integer id; @Column(name = "deal_id") private String dealId; .... .... @Temporal(TemporalType.TIMESTAMP) @Column(name = "timestamp") private Date timestamp; @Column(name = "amount") private BigDecimal amount; @OneToOne(cascade = CascadeType.ALL) @JoinColumn(name = "source_file") private MyFile sourceFile; public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) { this.id = id; this.dealId = dealId; ... ... ... this.amount = amount; this.sourceFile = sourceFile; } public String getDealId() { return dealId; } public void setDealId(String dealId) { this.dealId = dealId; } ... ... .... public BigDecimal getAmount() { return amount; } public void setAmount(BigDecimal amount) { this.amount = amount; } .... public Integer getId() { return id; } public void setId(Integer id) { this.id = id; } 

`

  1. Retentive code (service):

`

 @Service @Transactional public class ServiceImpl implements MyService{ @Autowired private MyDao dao; .... `void foo(){ for(MyObject d : listOfObjects_100000){ dao.persist(d); } } 

`4. Dao class:

`

 @Repository public class DaoImpl implements MyDao{ @PersistenceContext private EntityManager em; public void persist(Deal deal){ em.persist(deal); } } 

`

Magazines: `

 DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] 

... ...

 DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG ohejbinternal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG ohejbatch.internal.BatchingBatch - Executing batch size: 27 18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=? 18:26:34.015 [http-nio-8080-exec-2] DEBUG ohejbatch.internal.BatchingBatch - Executing batch size: 1 18:26:34.018 [http-nio-8080-exec-2] DEBUG ohetijdbc.JdbcTransaction - committed JDBC Connection 18:26:34.018 [http-nio-8080-exec-2] DEBUG ohetijdbc.JdbcTransaction - re-enabling autocommit 18:26:34.032 [http-nio-8080-exec-2] DEBUG osorm.jpa.JpaTransactionManager - Closing JPA EntityManager [ org.hibernate.jpa.internal.EntityManagerImpl@2354fb09 ] after transaction 18:26:34.032 [http-nio-8080-exec-2] DEBUG osojpa.EntityManagerFactoryUtils - Closing JPA EntityManager 18:26:34.032 [http-nio-8080-exec-2] DEBUG ohejinternal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch 18:26:34.032 [http-nio-8080-exec-2] DEBUG ohejiLogicalConnectionImpl - Releasing JDBC connection 18:26:34.033 [http-nio-8080-exec-2] DEBUG ohejiLogicalConnectionImpl - Released JDBC connection 

+5
source share
4 answers

After completing all the possible solutions, I finally found a solution to insert 100,000 rows in 5 seconds!

Things I tried:

1) Replaced the hibernation / database AUTOINCREMENT / GENERATED id with a self-generated identifier using AtomicInteger

2) Enabling batch_inserts with batch_size = 50

3) Flush cache after each number of calls to persist ()

4) multithreading (did not try to do this)

Finally, it worked using an inline query with multiple inserts and pasting 1000 rows into a single sql insert query instead of using persist () for each object. To insert 100,000 entities, I create my own query like "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)" [1000 lines of insertion into one sql insert request]

Now it takes about 3 seconds to insert 100,000 records! So the bottleneck was the very thing! For bulk inserts, the only thing that seems to work is the built-in insert requests!

+5
source
  • You use Spring to manage the transaction, but break it up using thread as the current session context. When using Spring to manage your transactions, don't bother with the hibernate.current_session_context_class property. Take it away.

  • Do not use DriverManagerDataSource use the correct connection pool, for example HikariCP.

  • In a for loop, you should flush and clear EntityManager at regular intervals, preferably the same as the batch size. If you do not perform any emphasis, it takes longer and longer, because when you do this Hibernate, check the first level cache for dirty objects, the more objects there is more time. With 10 or 100, this is acceptable, but checking 10,000 units of objects for each save will have its losses.

-

 @Service @Transactional public class ServiceImpl implements MyService{ @Autowired private MyDao dao; @PersistenceContext private EntityManager em; void foo(){ int count = 0; for(MyObject d : listOfObjects_100000){ dao.persist(d); count++; if ( (count % 30) == 0) { em.flush(); em.clear(); } } } 

For a more detailed explanation, see this blog and this blog .

+2
source

Another option to consider is StatelessSession :

Command API for performing bulk operations from a database.

A session without saving does not use the first level cache, interact with any second level cache and do not perform transactional recording or automatic dirty checking, as well as a cascade of operations to related instances. Collections are ignored by the session without being saved. Operations performed through a session without preserving the state of the Hibernate workaround and interceptors. Stateless sessions are vulnerable to data smoothing effects due to the lack of first level cache.

For certain types of transactions, a stateless session may perform slightly faster than a stateful session.

Related discussion: Using StatelessSession for batch processing

0
source

Uff. You can do a lot to increase speed.

1.) Use @DynamicInsert and @DynamicUpdate to prevent non-empty columns from being added to the database and updating modified columns.

2.) Try inserting the columns directly (without using hibernation) into your database to see if hibernation is really your bottleneck.

3.) Use a sessionfactory and only commit each transaction, for example. 100 inserts. Or just open and close the transaction once and clear your data every 100 inserts.

4.) Use the "sequence" identifier generation strategy and give the hibernate preallocate (via the allocationsize parameter) identifiers.

5.) Use caches.

Some of these possible solutions may have time deficiencies if used improperly. But you have many opportunities.

-1
source

Source: https://habr.com/ru/post/1268325/


All Articles