Hadoop, equipment and bioinformatics

We are going to purchase new equipment to run our analyzes and are wondering if we are making the right decisions.

Setup:
We are a bioinformatics laboratory that will process DNA sequencing data. The biggest problem our area has is the amount of data, not the calculation. One experiment will quickly go to 10s-100s Gb, and we will usually conduct different experiments at the same time. Obviously, the approaches related to mapreduce are interesting (see also http://abhishek-tiwari.com/2010/08/mapreduce-and-hadoop-algorithms-in-bioinformatics-papers.html ), but not all of our software uses this paradigm. In addition, some software uses ascii files as in / output, while other software works with binary files.

What we can buy:
The machine we could buy would be a server with 32 cores and 192 GB of RAM, connected to NAS storage (> 20Tb). This is a very interesting installation for many of our (non-mapreduce) applications, but will this configuration prevent us from implementing hasoop / mapreduce / hdfs in a meaningful way?

Thanks a
lot , January

+3
source share
1 answer

You have an interesting configuration. What will Disk IO be for your NAS storage?

: Map . , . . . , , . Map Reduce ? Map Reduce , . , , , . , - 100 /. 100 100 * 100 / = 10 /.

. , IO .

, .

+4

Source: https://habr.com/ru/post/1793123/


All Articles