Situation:
For the web store, I want to create lists of calculated products and filters in these lists using Elasticsearch. I want to bypass the PHP / MySQL server on which the application runs fully and communicate with Elasticsearch directly from the client’s browser through AJAX calls. Benefits:
- Most of the loading on the PHP / MySQL server will be handled by the ES cluster instead
- CDN features (scaling!)
Problem:
This approach will require a huge load on our server server, but it creates several new problems. Anonymous users will generate a lot of requests, but we need some control over them:
Traffic management:
- How to protect a lot of calls from malicious users and scan / download our entire product catalog in this way? (e.g. competition scraping pricing information).
- How to block an IP address that has been identified (somehow) as feeling bad?
Access control:
- How to make sure that the interface can only fulfill the requests that we want to resolve?
- How to make sure that customers see only a selection of result fields and cannot receive data from ES that are not intended for them?
It doesn’t matter that not a single machine takes care of all this, it simply recreates the single machine responsible for everything. I want to take the real advantage of the ES cluster without having middleware that should also solve the scaling issue.
We do not want to completely depend on a third-party partner, we are looking for a solution that has some flexibility with respect to the partners with whom we work (for example, switch between flexible and AWS).
Possible solutions or partial solutions:
I considered several options for "Elasticsearch as a service", but I'm not sure about their quality or can even solve the problems mentioned:
- www.elastic.co/found, their premium solution has a “shield” service, which does not seem to cover all the cases mentioned above (only IP blocking, as far as I can tell), but there is a user plugin ( https: // github .com / floragunncom / search-guard ), which can filter by result fields and provide a way to manage users, etc. This seems like a reasonable option, but it is expensive and links the application to the “found” product. We should be able to switch partners if necessary.
- Amazon AWS Elasticsearch has basic IAM support, and it is possible to put CloudFront in front of it, but it does not provide any access control.
- Install a separate L7 application filtering solution for scraper detection, etc.
Question:
Is there anyone who works with this type of work and found a good setting that solves all these problems?
source share