Using sqlite for very large merges and basic queries

I am a new guy in databases and I am trying to find a good solution for working with large data sets. I mainly do statistical analysis using R, so I don't need the database as a backend for web pages or anything else. The data sets are usually static - they are simply large.

I tried to make a simple left join of a ~ 10,000,000 record table on a 1,400,000 table. The 1.4 m table had unique entries. After shaking for 3 hours, he left me. The query was set correctly - I ran it, limiting the search results to 1000 records, and it returned exactly as I expected. In the end, I found a way to split this into 10 queries, and it started, but by then I was able to quickly do this merge in R, without any fancy sqlite calls and indexing.

I was looking to use databases because I thought they were faster / more efficient for these basic data manipulations, but maybe I'm just missing something. In the above example, I was indexed in the corresponding columns, and I am surprised that sqlite could not handle it while R could.

Sorry if this question is a bit foggy (I'm a bit foggy on databases), but if anyone has any advice on something obvious, I am doing it wrong so as not to take advantage of sqlite, that would be great, Or I just expect many of them, and the merger at 100 m X 1.4 m is just too big to execute it without breaking it?

I would have thought that a database could outperform R in this regard?

thank!

EXL

+3
source share
2 answers

I am going through the same process. If you look at the questions I asked recently, you can get some good recommendations, or at least avoid a lot of the time I wasted.). In short, it was very helpful here.

- RSQLite package

- RSQLite.extfuns package

- SQLite FAQ

, SQLite , , . , , R /. , SQL-, sqldf. , JD Long sqldf .

+3

, , . SQLite , , . SQLite , . SQLite , . SQLite . -, , , .

, , SQLite. . , . JOIN , JOIN . , .

, SQLite, , , , . , . - .

, .

+2

Source: https://habr.com/ru/post/1777347/


All Articles