I am a new guy in databases and I am trying to find a good solution for working with large data sets. I mainly do statistical analysis using R, so I don't need the database as a backend for web pages or anything else. The data sets are usually static - they are simply large.
I tried to make a simple left join of a ~ 10,000,000 record table on a 1,400,000 table. The 1.4 m table had unique entries. After shaking for 3 hours, he left me. The query was set correctly - I ran it, limiting the search results to 1000 records, and it returned exactly as I expected. In the end, I found a way to split this into 10 queries, and it started, but by then I was able to quickly do this merge in R, without any fancy sqlite calls and indexing.
I was looking to use databases because I thought they were faster / more efficient for these basic data manipulations, but maybe I'm just missing something. In the above example, I was indexed in the corresponding columns, and I am surprised that sqlite could not handle it while R could.
Sorry if this question is a bit foggy (I'm a bit foggy on databases), but if anyone has any advice on something obvious, I am doing it wrong so as not to take advantage of sqlite, that would be great, Or I just expect many of them, and the merger at 100 m X 1.4 m is just too big to execute it without breaking it?
I would have thought that a database could outperform R in this regard?
thank!
EXL
source
share