Read / write / search / replace huge csv file

I have a huge (4.5 GB) csv file. I need to do basic cutting and pasting, replace operations for some columns .. the data is pretty well organized .. The only problem is that I cannot play with it with Excel because of the size (2000 rows, 550,000 columns).

here is some piece of data:

ID,Affection,Sex,DRB1_1,DRB1_2,SENum,SEStatus,AntiCCP,RFUW,rs3094315,rs12562034,rs3934834,rs9442372,rs3737728

D0024949,0,F,0101,0401,SS,yes,?,?,A_A,A_A,G_G,G_G
D0024302,0,F,0101,7,SN,yes,?,?,A_A,G_G,A_G,?_?
D0023151,0,F,0101,11,SN,yes,?,?,A_A,G_G,G_G,G_G

I need to remove 4, 5, 6, 7, 8, and 9 columns; I need to find each character from column 10 and replace it with space (); Do I need to replace everything? with zero (0); I need to replace each comma with a tab; I need to delete the first row (with column names; Do I need to replace each 0 with 1, each 1 with 2 and each? S 0 in the 2nd column; I need to replace F with 2, M with 1 and? C 0 in the third column ;

so in the resulting file, the output will look like this:

D0024949 1 2 A A A A G G G G

D0024302 1 2 A A G G A G 0 0

D0023151 1 2 A A G G G G G G

(both input and output should read one line per line, ne extra empty line) Is there an efficient way of memory using java (and I need code for this) or a useful tool to play with this big data so that I can easily apply Excel functions.

+3
source share
1 answer

You need two things:
- Knowledge of regular expressions (aka Regex, Regexes)
- PowerGrep

+1
source

Source: https://habr.com/ru/post/1748153/


All Articles