Hadoop (unlike popular media opinions) is not a database. What you are describing is a database. Therefore, Hadoop is not a good candidate for you. In addition, the next position is stubborn, so feel free to prove to me that I am wrong in the tests.
If you care about the "NoSql DB" that are on top of Hadoop:
- HBase is suitable for heavy recording operations, but sucks huge deletions.
- Cassandra is the same story, but not as fast as HBase
- Accumulo may be useful for very frequent updates, but will also suck on removal.
None of them make "real" use of SSDs, I think that they all do not get huge acceleration from them.
They all suffer from expensive builds if you start fragmenting your tablets (in BigTable speech), so removal is a pretty obvious limiting factor.
What you can do to troubleshoot deletion problems is simply to overwrite the permanent "deleted" value that bypasses the compaction. However, your table is growing, which can be expensive on SSDs. You will also need to filter, which may affect read latency.
From what you described, Amazon DynamoDB architecture sounds like the best candidate here. Although removal here is also expensive, it may not be as strong as the above alternatives.
BTW: The recommended way to delete multiple rows from tables in any of the above databases is to simply delete the table completely. If you can put your design in this paradigm, any of them will do.
source share