SQL loop to read, then write data to file

I have a dataset that contains 57 million rows and 23 columns. There is a column with the names of species of different birds (about 2000 unique names), and I would like to select two data columns (latitude, longitude) for each unique name of the species and write lat / long data for each species in the file, with the name of the species as the name file. It takes too much time to make of R, the only language I know. What would be the appropriate code for this task?

I am trying to use some pseudo code to demonstrate that I assume that the code might look something like this:

FOR i IN 1:unique(species_name) SELECT latitude,longitude WHERE species_name=[i] WRITE [some code that writes a text file with species name as the file name] LOOP END; 

I assume I can do such things in an OSX terminal?

EDIT 20111211: Here is my workflow from R:

  require(RMySQL); require(plyr) drv <- dbDriver("MySQL") con <- dbConnect(drv, user = "asdfaf", dbname = "test", host = "localhost") splist <- read.csv("splist_use.csv") sqlwrite <- function(spname) { cat(spname) g1 <- dbGetQuery(con , paste("SELECT col_16,col_18 FROM dat WHERE col_11='" , spname, "'", sep="") ) write.csv(g1, paste(spname, ".csv", sep="")) rm("g1") } l_ply(splist, sqlwrite, .progress="text" ) 
+4
source share
3 answers

IMHO the best you can do is use a scripting language (python, perl, php, shell) and generate file names and queries from there. It is not too difficult, but you will have to learn another language. SQL is not suitable for imperative programming.

+1
source

Have you tried to use outfile MySQL functionality?

 SELECT col_16,col_18 FROM dat WHERE col_11= spiecesname INTO OUTFILE '/tmp/spiecesname.csb' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' 

With a bit of work, you could get MySQL to select each unique view name, view the results and output them to a unique csv file.

You should have an order by offer upon your request

+1
source

is the excel output file and you have an advantage? If so, you can use Excel to connect to the database and issue your query to retrieve the data. then to .xls or csv. format. However, this suggests that the results are less than 1,000,000.

In excel, go to the data tab, select from other sources, select and enter your preferred connection method. From here you can define a table or query to run. (if the results are less than the number of rows supported on the worksheet for the version of Excel in which you are located), the files will be extracted using the method you selected. It should be faster than the IO you are currently doing.

0
source

Source: https://habr.com/ru/post/1385703/


All Articles