I have a project in which I am doing data mining of a large database. I am currently storing all the data in text files, I am trying to understand the costs and benefits of storing a relational database. The points look like this:
CREATE TABLE data ( source1 CHAR(5), source2 CHAR(5), idx11 INT, idx12 INT, idx21 INT, idx22 INT, point1 FLOAT, point2 FLOAT );
How many such moments can be obtained with reasonable performance? I currently have about 150 million data points, and I probably won't have more than 300 million. Suppose I use a box with 4 dual-core Xeon 2ghz processors and 8 GB of RAM.
MySQL , Alex PostgreSQL. , DML, , , .
, PostgreSQL , MySQL . MyISAM , , , concurrency , , InnoDB MySQL, . , MyISAM InnoDB, , . MyISAM . 1 MySQL MyISAM , . MySQL MySQL Storage Engines , 113M , .
, . , , . , , , , . (SQL) .. ..
.
PostgreSQL - 32 .. .. , 5 , 10 ( 36 / 300 ), .
FYI: Postgres , MySQL, / , , (, ).
, ( , - ) . , Postgres.
OTOH, if the data is downloaded once and then scanned in a single thread, it is possible that MySQL in the "ACID is not required" mode would be a better match.
Do you have any planning for using access (s) before you can select the "best" stack.
Source: https://habr.com/ru/post/1712816/More articles:How to submit form via jQuery and update PartialView - jqueryWant a drawn circle to follow my mouse in C # - c #Update view controller - iphoneHudson "stops" / cancels work, leaving processes behind - tomcatBackground mouse - c #How to remove headers in an ISAPI filter? - cGet a song playing in Windows Media Player - c #JEXCELAPI for Android - androidAndroid and IOException - strange error - javaSetting up Gedit for development in C ++ - c ++All Articles