Sync data from MSSQL to Elasticsearch using Apache Kafka

Question

Sync data from MSSQL to Elasticsearch using Apache Kafka

I am currently doing a text search in SQL Server, which is becoming a bottleneck, and I would like to move things to Elasticsearch for obvious reasons, however I know that I need to denormalize data for better performance and scalability.

Currently, my text search involves some aggregation and joining of several tables to get the final result. The tables that are combined are not so large (up to 20 GB per table), but they change (inserted, updated, deleted) irregularly (two of them once a week, the other on demand xonce a day).

My plan would be to use Apache Kafka with Kafka Connect to read CDC from my SQL Server, attach this data to Kafka and save it to Elasticsearch, however I cannot find any material telling me how the deletions will be processed during data processing. preserved in Elasticsearch.

Is it even supported by default driver? If not, what are the options? Apache Spark, Logstash?

+15

sql-server elasticsearch apache-kafka apache-kafka-connect

Evaldas Buinauskas Aug 08 '17 at 19:30

source share

1 answer

Dennis Jaheruddin · Answer 1 · 2019-07-15T10:32:05+0000

I'm not sure if this is already possible in Kafka Connect now, but it seems that this can be solved using Nifi.

, , Elasticsearch NiFi:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-5-nar/1.5.0/org.apache.nifi.processors.elasticsearch.DeleteElasticsearch5/

Sync data from MSSQL to Elasticsearch using Apache Kafka

More articles: