A traffic and access management solution for connecting AJAX with Elasticsearch?

Situation:

For the web store, I want to create lists of calculated products and filters in these lists using Elasticsearch. I want to bypass the PHP / MySQL server on which the application runs fully and communicate with Elasticsearch directly from the client’s browser through AJAX calls. Benefits:

  • Most of the loading on the PHP / MySQL server will be handled by the ES cluster instead
  • CDN features (scaling!)

Problem:

This approach will require a huge load on our server server, but it creates several new problems. Anonymous users will generate a lot of requests, but we need some control over them:

Traffic management:

  • How to protect a lot of calls from malicious users and scan / download our entire product catalog in this way? (e.g. competition scraping pricing information).
  • How to block an IP address that has been identified (somehow) as feeling bad?

Access control:

  • How to make sure that the interface can only fulfill the requests that we want to resolve?
  • How to make sure that customers see only a selection of result fields and cannot receive data from ES that are not intended for them?

It doesn’t matter that not a single machine takes care of all this, it simply recreates the single machine responsible for everything. I want to take the real advantage of the ES cluster without having middleware that should also solve the scaling issue.

We do not want to completely depend on a third-party partner, we are looking for a solution that has some flexibility with respect to the partners with whom we work (for example, switch between flexible and AWS).

Possible solutions or partial solutions:

I considered several options for "Elasticsearch as a service", but I'm not sure about their quality or can even solve the problems mentioned:

  • www.elastic.co/found, their premium solution has a “shield” service, which does not seem to cover all the cases mentioned above (only IP blocking, as far as I can tell), but there is a user plugin ( https: // github .com / floragunncom / search-guard ), which can filter by result fields and provide a way to manage users, etc. This seems like a reasonable option, but it is expensive and links the application to the “found” product. We should be able to switch partners if necessary.
  • Amazon AWS Elasticsearch has basic IAM support, and it is possible to put CloudFront in front of it, but it does not provide any access control.
  • Install a separate L7 application filtering solution for scraper detection, etc.

Question:

Is there anyone who works with this type of work and found a good setting that solves all these problems?

+5
source share
1 answer

First of all, I would recommend restricting access to your instance of flexible search due to the security group and allowing access to the IP addresses of application servers on ports 22, 80, 9200, and 9300, which are the ports used by ElasticSearch.

Regarding protection against disposal, there is no absolute solution for protection, however, if your goal is simply to limit the load that these scrapers place on your application server and ES instance, you can take a look at https://github.com/davedevelopment/stiphle , which is aimed at users with a limited course, the exmaple that they use on their page limits 5 requests per second, which seems very reasonable for the average user and can be further reduced if necessary, you will have to abandon labor-intensive efforts.

0
source

Source: https://habr.com/ru/post/1237657/


All Articles