I am doing some text processing on a unix system. I have command line access on this computer and it has Python, Perl and the default word processing programs, awk, etc.
I have a text file that looks like this:
2029754527851451717
2029754527851451717
2029754527851451717
2029754527851451717
2029754527851451717
2029754527851451717 1232453488239 Tue Mar 3 10:47:44 2009
2029754527851451717 1232453488302 Tue Mar 3 10:47:44 2009
2029754527851451717 1232453488365 Tue Mar 3 10:47:44 2009
2895635937120524206
2895635937120524206
2895635937120524206
2895635937120524206
2895635937120524206
2895635937120524206
5622983575622325494 1232453323986 Thu Feb 12 15:57:49 2009
These are basically 3 lines: ID ID Date
I want to delete all rows that do not have 2 identifiers and dates. So, the final results will be as follows:
2029754527851451717 1232453488239 Tue Mar 3 10:47:44 2009
2029754527851451717 1232453488302 Tue Mar 3 10:47:44 2009
2029754527851451717 1232453488365 Tue Mar 3 10:47:44 2009
5622983575622325494 1232453323986 Thu Feb 12 15:57:49 2009
How would you decide to do this? In general, a text file is about 30,000 lines long.
Greetings
Eef
source
share