Cassandra or Hadoop Hive or MYSQL?

Am I developing a web scanner that is good for storing data? Cassandra or Hadoop Hive or MySQL? And why? I have 1 TB of data for the last 6 months in my MySQL database, I need to index it, and I need to display it in my search as soon as possible, and, as I think, it will store more DATA, for example, 10 Peta Byes, since my crawler is fast, I need to quickly perform a read / write operation, I need to integrate it into my PHP application

+3
source share
3 answers

It depends on the details of your requirements, but I think HBase would be the best option in your case.
Using HBase as a web crawler database is well documented and uses HBase, described in the BigTable technical documentation.

+3
source

You are looking for something that is designed to search for documents based on their contents - it should be based on an inverted index. I think the most natural option would be Lucene .

See also this article on the Hadoop-Lucene stack for requesting terabytes of documents.

0
source

, HBASE . Cassandra , HBASE.

. Impala.

0
source

Source: https://habr.com/ru/post/1760190/


All Articles