I am currently doing a text search in SQL Server, which is becoming a bottleneck, and I would like to move things to Elasticsearch for obvious reasons, however I know that I need to denormalize data for better performance and scalability.
Currently, my text search involves some aggregation and joining of several tables to get the final result. The tables that are combined are not so large (up to 20 GB per table), but they change (inserted, updated, deleted) irregularly (two of them once a week, the other on demand xonce a day).
My plan would be to use Apache Kafka with Kafka Connect to read CDC from my SQL Server, attach this data to Kafka and save it to Elasticsearch, however I cannot find any material telling me how the deletions will be processed during data processing. preserved in Elasticsearch.
Is it even supported by default driver? If not, what are the options? Apache Spark, Logstash?
source
share