How to synchronize MSSQL with Elasticsearch?

Every time I do this on Google, I find a river approach that is out of date. I use Dapper if this is somehow useful information.

So what is the solution for this these days?

+5
source share
5 answers

Your question is on the wide side - so this is a pointer to some parameters.

Elastic search is used to query the database and analyze data.

In the article Obsolete Rivers :

Client Libraries

For over a year now, we have official client libraries for Elasticsearch in most programming languages. This means that connecting to your application and retrieving data through an existing code base should be relatively simple. This method also makes it easy before you get into Elasticsearch. A common example is an application that has already used ORM to map a domain model to a database, and binding and indexing a domain model before Elasticsearch tends to be easy to implement.

Extensive documentation is available on how to use elastic search in:

Elasticsearch.Net .

The documents will indicate the following:

Install Package:

PM> Install-Package Elasticsearch.Net 

Compound

 var node = new Uri("http://mynode.example.com:8082/apiKey"); var config = new ConnectionConfiguration(node); var client = new ElasticsearchClient(config);` 

Security

Consolidation and fault tolerance

Building requests

This is what you will need to develop.

Response Processing

Error processing

Plugins

Logstash can also be used instead of the rivers from which various plugins were developed.

In addition, Logstash or similar tools can be used to send data to Elasticsearch. For example, some of the Elasticsearch rivers came with now implemented as Logstash plugins (like CouchDB) in the upcoming Logstash 1.5.

Additional reading

Although it’s a different language and structure, the Advanced Search blog is for your legacy David Pilato app and the information may be useful to view. He recommends doing this at the application level.

To fix problems from the comments.

Track data changes .

SQL Server provides an embedded system for tracking data changes, an effective tool for automatically tracking data changes without the need for manual methods for checking for changes.

There are two ways to do this:

Using Change Data Capture :

Data changes are tracked with time stamps. You can track the history of data changes.

A data capture change provides information about historical changes to the user table, capturing both the fact that DML changes have been made and the actual data that has been changed. Changes are committed using an asynchronous process that reads the transaction log and has a low level of impact on the system.

Using Change Tracking :

It has less overhead, but does not track historical changes. Recent changes are saved, but not reverted.

Tracking changes reflects the fact that the rows in the table have been changed, but do not capture data that has been changed. This allows applications to identify rows that have been modified using the latest row data obtained directly from user tables. Consequently, change tracking is more limited in historical questions, which it can answer compared to collecting change data .... / ...

+6
source

You can use Logstash to complete the task. Just use the logstash JDBC plugin to configure the log pipeline. Follow this link: - Transfer MySQL data to ElasticSearch

Also check out this repo on GitHub ElasticSearchCRUD

+1
source

Even if the question asks about synchronization with MSSQLElasticSearch , I feel that the basic idea of ​​synchronization over heterogeneous systems will be completely the same. You may need

  • Define and create batches of data for synchronization
  • Track the latest packet synchronization to determine where to start, mostly tokens
  • Data conversion
  • Transport the package permanently

This article Continuous Data Sync on Hetereogeneous Systems - YoursAndMyIdeas explains all the details to achieve this in more detail.

+1
source

How to synchronize MSSQL with Elasticsearch?

One simple solution would be to set up a PowerShell script ( SQL Agent ):

 Import-Module 'sqlps' -DisableNameChecking; Invoke-Sqlcmd ` -ServerInstance "(local)\SQL2016" ` -Database "msdb" ` -Query "SELECT TOP(1) object_id AS id, name, type_desc FROM sys.objects" ` | Select-Object * -ExcludeProperty ItemArray, Table, RowError, RowState, HasErrors -OutVariable sql_results $id = $sql_results.id $json_results = ConvertTo-Json -InputObject $sql_results[0] # In this case, input object is an array/table with a single row Invoke-RestMethod "http://localhost:9200/index007/type007/$id" -Method Put -Body $json_results -ContentType "application/json" 
0
source

So, only my 2 ¢ to implement this. I used to do this by setting a trigger to write to the buffer table, which acted as an event log. Then I had a serverless function (AWS Lambda) on the timer, which would exit this event log and introduce the necessary changes to ES. That way, I didn't need anything crazy in the trigger or even change my source code.

0
source

Source: https://habr.com/ru/post/1270911/


All Articles