Cassandra or Hadoop Hive or MYSQL?

Question

Cassandra or Hadoop Hive or MYSQL?

Am I developing a web scanner that is good for storing data? Cassandra or Hadoop Hive or MySQL? And why? I have 1 TB of data for the last 6 months in my MySQL database, I need to index it, and I need to display it in my search as soon as possible, and, as I think, it will store more DATA, for example, 10 Peta Byes, since my crawler is fast, I need to quickly perform a read / write operation, I need to integrate it into my PHP application

+3

mysql cassandra hbase hadoop

Jesvin Aug 17 '10 at 21:18

source share

3 answers

You are looking for something that is designed to search for documents based on their contents - it should be based on an inverted index. I think the most natural option would be Lucene .

See also this article on the Hadoop-Lucene stack for requesting terabytes of documents.

0

Ken bloom Aug 20 '10 at 3:48

source share

, HBASE . Cassandra , HBASE.

. Impala.

0

K S Nidhin 01 . '13 5:07

source share

wlk · Accepted Answer · 2010-08-17T22:32:45+0000

It depends on the details of your requirements, but I think HBase would be the best option in your case.
Using HBase as a web crawler database is well documented and uses HBase, described in the BigTable technical documentation.

Cassandra or Hadoop Hive or MYSQL?

More articles: