We have an email service that hosts about 10,000 domains, so we save the message headers in the SQL Server database.
I need to implement an application that will search the message body for keywords. Messages are stored as files on the NAS storage system.
As a proof of concept, I implemented a search engine based on an SQL server, I would analyze the message and save all the words in the database table along with memberid and messageid. The database was on a separate server in the headers database.
The problem with this system was that I ended the table with 600 million rows after processing messages on only one domain. Obviously, this is not a very scalable solution.
Since the headers are stored in the SQL Server table, I will need to join the message identifiers from the search application to the header table in order to display the messages containing the searched keywords.
Any suggestions for better architecture? Any better alternative to using SQL Server? We receive over 20 million messages per day. We are a small company with limited resources regarding servers, maintenance, etc.
thanks
klork source
share